API Reference#
Segmentation#
- class tab_right.segmentations.calc_seg.SegmentationCalc(gdf: DataFrameGroupBy, label_col: str, prediction_col: str | List[str], segment_names: Dict[int, Any])[source]
Bases:
objectImplementation of BaseSegmentationCalc protocol.
Calculates scores for pre-defined segments.
- gdf
Grouped DataFrame, each group represents a segment (grouped by segment_id).
- Type:
DataFrameGroupBy
- label_col
Column name for the true target values.
- Type:
str
- prediction_col
Column names for the predicted values.
- Type:
Union[str, List[str]]
- segment_names
Mapping from segment_id to the original group name (category or interval).
- Type:
Dict[int, Any]
- gdf: DataFrameGroupBy
- label_col: str
- prediction_col: str | List[str]
- segment_names: Dict[int, Any]
- class tab_right.segmentations.double_seg.DoubleSegmentationImp(df: DataFrame, label_col: str, prediction_col: str | List[str])[source]
Bases:
objectImplementation of double segmentation logic based on two features.
Conforms to the DoubleSegmentation protocol.
- df
The input DataFrame containing the data to segment.
- Type:
pd.DataFrame
- label_col
The name of the column containing the true target values.
- Type:
str
- prediction_col
The name(s) of the column(s) containing the predicted values.
- Type:
Union[str, List[str]]
- df: DataFrame
- label_col: str
- prediction_col: str | List[str]
Drift Detection#
- class tab_right.drift.drift_calculator.DriftCalculator(df1: DataFrame, df2: DataFrame, kind: Dict[str, str] | None = None)[source]#
Bases:
objectImplementation of DriftCalcP using Cramér’s V and Wasserstein distance.
- df1: DataFrame#
- df2: DataFrame#
- get_prob_density(columns: Iterable[str] | None = None, bins: int = 10) DataFrame[source]#
Get probability densities for reference and current datasets (vectorized).
- Returns:
DataFrame with density information for each feature and bin, containing: - feature: Name of the feature - bin: Bin label (category name or numerical range) - ref_density: Density in the reference dataset - cur_density: Density in the current dataset
- Return type:
pd.DataFrame
- kind: Dict[str, str] | None = None#
Drift Metrics#
- tab_right.drift.univariate.detect_univariate_drift_df(reference: DataFrame, current: DataFrame, kind: str = 'auto', normalize: bool = True, normalization_method: str = 'range') DataFrame[source]#
Detect drift for each column in two DataFrames.
- Parameters:
reference (pd.DataFrame) – Reference DataFrame.
current (pd.DataFrame.) – Current DataFrame.
kind (str, default "auto") – “auto”, “categorical”, or “continuous”. If “auto”, infers from dtype.
normalize (bool, default True) – Whether to normalize continuous drift scores
normalization_method (str, default "range") – Method to use for normalization, see normalize_wasserstein for options
- Returns:
DataFrame with columns: feature, metric, value, raw_value (for continuous features).
- Return type:
pd.DataFrame
Notes
This function is provided for backward compatibility. For new code, use the UnivariateDriftCalculator class instead.
- tab_right.drift.univariate.detect_univariate_drift(reference: Series, current: Series, kind: str = 'auto', normalize: bool = True, normalization_method: str = 'range') Tuple[str, float][source]#
Detect drift between two 1D distributions.
- Parameters:
reference (pd.Series) – Reference distribution.
current (pd.Series) – Current distribution.
kind (str, default "auto") – “auto”, “categorical”, or “continuous”. If “auto”, infers from dtype.
normalize (bool, default True) – Whether to normalize continuous drift scores
normalization_method (str, default "range") – Method to use for normalization, see normalize_wasserstein for options
- Returns:
(metric name, value)
- Return type:
tuple
Notes
This function calls detect_univariate_drift_with_options internally and may raise ValueError if kind is not recognized or if an invalid normalization method is specified.
Plotting Functions#
Segmentation Plotting#
- tab_right.plotting.plot_segmentations.plot_single_segmentation_impl(df: DataFrame, lower_is_better: bool = True) Figure[source]#
Implement the single segmentation plot as a Plotly bar chart (compatibility function).
This is kept for backwards compatibility and wraps _plot_single_segmentation_plotly.
- Parameters:
df (pd.DataFrame) – See module docstring for format details.
lower_is_better (bool, default=True) – Whether lower values indicate better performance.
- Returns:
A Plotly bar chart.
- Return type:
PlotlyFigure
- tab_right.plotting.plot_segmentations.plot_single_segmentation(df: DataFrame, lower_is_better: bool = True, backend: Literal['plotly', 'matplotlib'] = 'plotly') Figure | Figure[source]#
Plot the single segmentation of a given DataFrame as a bar chart.
This function can use either Plotly or Matplotlib as backend.
- Parameters:
df (pd.DataFrame) – See module docstring for format details.
lower_is_better (bool, default=True) – Whether lower values of the metric indicate better performance.
backend (str, default="plotly") – The plotting backend to use. Either “plotly” or “matplotlib”.
- Returns:
A bar chart showing each segment with its corresponding avg score.
- Return type:
Figure
- tab_right.plotting.plot_segmentations.plot_single_segmentation_mp(df: DataFrame, lower_is_better: bool = True) Figure[source]#
Plot the single segmentation using matplotlib (compatibility function).
This is a wrapper around plot_single_segmentation with backend=”matplotlib” for backwards compatibility.
- Parameters:
df (pd.DataFrame) – See module docstring for format details.
lower_is_better (bool, default=True) – Whether lower values indicate better performance.
- Returns:
A matplotlib bar chart showing each segment with its corresponding score.
- Return type:
MatplotlibFigure
- class tab_right.plotting.plot_segmentations.DoubleSegmPlotting(df: DataFrame, metric_name: str = 'score', lower_is_better: bool = True, backend: Literal['plotly', 'matplotlib'] = 'plotly')[source]#
Bases:
objectClass for double segmentation plotting with support for multiple backends.
This class implements the interface for plotting double segmentations. It includes the DataFrames to be plotted and supports multiple plotting backends.
See the module docstring for parameter details.
- backend: Literal['plotly', 'matplotlib'] = 'plotly'#
- df: DataFrame#
- get_heatmap_df() DataFrame[source]#
Get the DataFrame for the heatmap from the double segmentation df.
- Returns:
A DataFrame containing the groups defined by the decision tree model. columns: feature_1 ranges or categories index: feature_2 ranges or categories content: The calculated error metric for the segment.
- Return type:
pd.DataFrame
- lower_is_better: bool = True#
- metric_name: str = 'score'#
Drift Plotting#
- class tab_right.plotting.drift_plotter.DriftPlotter(drift_calc: DriftCalcP)[source]#
Bases:
objectImplementation of DriftPlotP protocol using Matplotlib.
- drift_calc: DriftCalcP#
- get_distribution_plots(columns: Iterable[str] | None = None, bins: int = 10, **kwargs: Any) Dict[str, Figure][source]#
Generate individual distribution comparison plots for multiple features.
- Parameters:
columns – Specific columns to generate plots for. If None, all available columns are used.
bins – Number of bins to use for continuous features.
**kwargs – Additional arguments passed to plot_single.
- Returns:
A dictionary mapping column names to their respective matplotlib Figure objects.
- plot_drift(drift_df: DataFrame, value_col: str = 'value', feature_col: str = 'feature') Figure[source]#
Plot drift values for each feature as a bar chart using Plotly.
- Parameters:
drift_df – DataFrame with drift results. Should contain columns for feature names and drift values.
value_col – Name of the column containing drift values.
feature_col – Name of the column containing feature names.
- Returns:
Plotly bar chart of drift values by feature.
- Return type:
go.Figure
- plot_drift_mp(drift_df: DataFrame, value_col: str = 'value', feature_col: str = 'feature') Figure[source]#
Plot drift values for each feature as a bar chart using Matplotlib.
- Parameters:
drift_df – DataFrame with drift results. Should contain columns for feature names and drift values.
value_col – Name of the column containing drift values.
feature_col – Name of the column containing feature names.
- Returns:
Matplotlib figure with bar chart of drift values by feature.
- Return type:
plt.Figure
- static plot_feature_drift(reference: Series, current: Series, feature_name: str = None, show_score: bool = True, ref_label: str = 'Train Dataset', cur_label: str = 'Test Dataset', normalize: bool = True, normalization_method: str = 'range', show_raw_score: bool = False) Figure[source]#
Plot distribution drift for a single feature using Plotly.
- Parameters:
reference – Reference (train) data for the feature.
current – Current (test) data for the feature.
feature_name – Name of the feature (for labeling plots).
show_score – Whether to display the drift score annotation.
ref_label – Label for the reference data.
cur_label – Label for the current data.
normalize – Whether to normalize the Wasserstein distance.
normalization_method – Method to use for normalization: “range”, “std”, or “iqr”.
show_raw_score – Whether to show both normalized and raw scores.
- Returns:
Plotly figure with overlaid distributions, means, and drift score.
- Return type:
go.Figure
- static plot_feature_drift_mp(reference: Series, current: Series, feature_name: str = None, show_score: bool = True, ref_label: str = 'Train Dataset', cur_label: str = 'Test Dataset', normalize: bool = True, normalization_method: str = 'range', show_raw_score: bool = False) Figure[source]#
Plot distribution drift for a single feature using Matplotlib.
- Parameters:
reference – Reference (train) data for the feature.
current – Current (test) data for the feature.
feature_name – Name of the feature (for labeling plots).
show_score – Whether to display the drift score annotation.
ref_label – Label for the reference data.
cur_label – Label for the current data.
normalize – Whether to normalize the Wasserstein distance.
normalization_method – Method to use for normalization: “range”, “std”, or “iqr”.
show_raw_score – Whether to show both normalized and raw scores.
- Returns:
Matplotlib figure with overlaid distributions, means, and drift score.
- Return type:
plt.Figure
- plot_multiple(columns: Iterable[str] | None = None, bins: int = 10, figsize: Tuple[int, int] = (12, 8), sort_by: str = 'score', ascending: bool = False, top_n: int | None = None, threshold: float | None = None, **kwargs: Any) Figure[source]#
Create a bar chart visualization of drift across multiple features.
- Parameters:
columns – Specific columns to plot drift for. If None, all available columns are used.
bins – Number of bins to use for continuous features.
figsize – Figure size as (width, height) in inches.
sort_by – Column to sort results by (usually “score”).
ascending – Whether to sort in ascending order.
top_n – If specified, only show the top N features.
threshold – If specified, mark features above this threshold in a different color.
**kwargs – Additional arguments passed to the drift calculator.
- Returns:
A matplotlib Figure object containing the generated plot.
- plot_single(column: str, bins: int = 10, figsize: Tuple[int, int] = (10, 6), show_metrics: bool = True, **kwargs: Any) Figure[source]#
Create a detailed visualization of drift for a single feature.
- Parameters:
column – The column/feature to visualize.
bins – Number of bins to use for continuous features.
figsize – Figure size as (width, height) in inches.
show_metrics – Whether to show drift metrics in the plot.
**kwargs – Additional arguments passed to the drift calculator.
- Returns:
A matplotlib Figure object containing the generated plot.
Base Architecture & Protocols#
Protocol definitions for data segmentation analysis in tab-right.
This module defines protocol classes and type aliases for segmentation analysis, including interfaces for segmentation calculations and feature-based segmentation.
- class tab_right.base_architecture.seg_protocols.BaseSegmentationCalc(*args, **kwargs)[source]#
Bases:
ProtocolBase protocol for segmentation performance calculations.
- Parameters:
gdf (DataFrameGroupBy) – Grouped DataFrame, each group represents a segment.
label_col (str) – Column name for the true target values.
prediction_col (Union[str, List[str]]) – Column names for the predicted values. Can be a single column or a list of columns. Can be probabilities (multiple columns) or classes or continuous values.
segment_names (Optional[Dict[int, Any]], default=None) – Optional mapping from an integer segment ID to the original group name (category, interval, or tuple). If provided, these IDs should match the grouping keys if gdf is grouped by integer IDs.
- gdf: DataFrameGroupBy#
- label_col: str#
- prediction_col: str | List[str]#
- segment_names: Dict[int, Any] | None = None#
- class tab_right.base_architecture.seg_protocols.DoubleSegmentation(*args, **kwargs)[source]#
Bases:
ProtocolClass schema for calculating double segmentation, segmentation based on two features.
- Parameters:
df (pd.DataFrame) – A DataFrame containing to analyze.
label_col (str) – The name of the column containing the true target values.
prediction_col (str) – The name of the column containing the predicted values. Can be probabilities (multiple columns) or classes or continuous values.
- df: DataFrame#
- label_col: str#
- prediction_col: str#
Module for defining plotting protocols.
- class tab_right.base_architecture.seg_plotting_protocols.DoubleSegmPlottingP(*args, **kwargs)[source]#
Bases:
ProtocolClass schema for double segmentation plotting.
This class is used to define the interface for plotting double segmentations. It includes the DataFrames to be plotted and the kind of plot to be created.
- Parameters:
df (pd.DataFrame) – A DataFrame containing the groups defined by the decision tree model. columns: - segment_id: The ID of the segment, for grouping. - feature_1: (str) the range or category of the first feature. - feature_2: (str) the range or category of the second feature. - score: (float) The calculated error metric for the segment.
metric_name (str, default="score") – The name of the metric column in the DataFrame.
lower_is_better (bool, default=True) – Whether lower values of the metric indicate better performance. Affects the color scale in visualizations (green for better, red for worse).
- df: DataFrame#
- get_heatmap_df() DataFrame[source]#
Get the DataFrame for the heatmap. from the double segmentation df.
- Returns:
A DataFrame containing the groups defined by the decision tree model. columns: feature_1 ranges or categories index: feature_2 ranges or categories content: The calculated error metric for the segment.
- Return type:
pd.DataFrame
- lower_is_better: bool = True#
- metric_name: str = 'score'#
- plot_heatmap() Figure | Figure[source]#
Plot the double segmentation of a given DataFrame as a heatmap.
- Returns:
A heatmap showing each segment with its corresponding avg score, from get_heatmap_df() method. Colors are determined by the lower_is_better parameter: - If lower_is_better=True: Lower values are green (better), higher values are red (worse) - If lower_is_better=False: Higher values are green (better), lower values are red (worse)
- Return type:
Figure
Protocol definitions for drift detection and analysis in tab-right.
This module defines protocol classes and type aliases used for implementing drift detection functionality across different feature types. These protocols establish a consistent interface for all drift detection implementations.
- class tab_right.base_architecture.drift_protocols.DriftCalcP(*args, **kwargs)[source]#
Bases:
ProtocolProtocol for drift calculation implementations.
This protocol defines the interface that all drift calculation classes must implement. It specifies methods for detecting distributional shifts between two datasets.
- Parameters:
df1 (pd.DataFrame) – The reference DataFrame containing the baseline distribution.
df2 (pd.DataFrame) – The current DataFrame to compare against the reference.
kind (Union[str, Iterable[bool], Dict[str, str]], default "auto") – Controls how columns are treated: - None: general policy to determine column types - Dict[str, str]: Explicit mapping from column name to “continuous” or “categorical”
Notes
Implementations of this protocol are responsible for: 1. Comparing distribution shifts between reference and current data 2. Automatically selecting appropriate metrics based on data types 3. Providing normalized scores for comparison across features
- df1: DataFrame#
- df2: DataFrame#
- get_prob_density(columns: Iterable[str] | None = None, bins: int = 10) DataFrame[source]#
Get the probability density functions for the features.
- Parameters:
columns (Optional[Iterable[str]], default None) – Specific columns to analyze. If None, analyzes all common columns.
bins (int, default 10) – Number of bins for histograms when analyzing continuous features.
- Returns:
A DataFrame containing the probability density functions. Must contain at least the following columns: - “feature”: The name of the feature. - “bin”: The bin or category. - “ref_density”: The density in the reference dataset. - “cur_density”: The density in the current dataset.
- Return type:
pd.DataFrame
- kind: Dict[str, str] | None#
Protocols for drift visualization in tab-right.
This module defines protocols for visualizing drift between datasets. It provides interfaces for creating both single-feature and multi-feature drift visualizations.
- class tab_right.base_architecture.drift_plot_protocols.DriftPlotP(*args, **kwargs)[source]#
Bases:
ProtocolProtocol for drift visualization implementations.
This protocol defines the interface that all drift visualization classes must implement. It specifies methods for creating visualizations of distribution shifts between datasets.
- Parameters:
drift_calc (DriftCalcP) – An implementation of DriftCalcP that provides the drift metrics to visualize.
- drift_calc: DriftCalcP#
- get_distribution_plots(columns: Iterable[str] | None = None, bins: int = 10, **kwargs: Any) Dict[str, Figure | Figure][source]#
Generate individual distribution comparison plots for multiple features.
- Parameters:
columns (Optional[Iterable[str]], default None) – Specific columns to visualize. If None, visualizes all common columns.
bins (int, default 10) – Number of bins for histograms when visualizing continuous features.
**kwargs (Any) – Additional parameters for the plotting implementation.
- Returns:
A dictionary mapping feature names to their distribution comparison plots.
- Return type:
Dict[str, Union[go.Figure, plt.Figure]]
- plot_multiple(columns: Iterable[str] | None = None, bins: int = 10, figsize: Tuple[int, int] = (12, 8), sort_by: str = 'score', ascending: bool = False, top_n: int | None = None, threshold: float | None = None, **kwargs: Any) Figure | Figure[source]#
Create a visualization of drift across multiple features.
- Parameters:
columns (Optional[Iterable[str]], default None) – Specific columns to visualize. If None, visualizes all common columns.
bins (int, default 10) – Number of bins for histograms when visualizing continuous features.
figsize (Tuple[int, int], default (12, 8)) – Figure size as (width, height) in inches.
sort_by (str, default "score") – Column to sort the results by, typically “score” or “feature”.
ascending (bool, default False) – Whether to sort in ascending or descending order.
top_n (Optional[int], default None) – If provided, only shows the top N features with highest drift.
threshold (Optional[float], default None) – If provided, highlights features with drift above this threshold.
**kwargs (Any) – Additional parameters for the plotting implementation.
- Returns:
A figure object containing the drift visualization.
- Return type:
Union[go.Figure, plt.Figure]
- plot_single(column: str, bins: int = 10, figsize: Tuple[int, int] = (10, 6), show_metrics: bool = True, **kwargs: Any) Figure | Figure[source]#
Create a detailed visualization of drift for a single feature.
- Parameters:
column (str) – The specific column to visualize.
bins (int, default 10) – Number of bins for histograms when visualizing continuous features.
figsize (Tuple[int, int], default (10, 6)) – Figure size as (width, height) in inches.
show_metrics (bool, default True) – Whether to display drift metrics on the plot.
**kwargs (Any) – Additional parameters for the plotting implementation.
- Returns:
A figure object containing the drift visualization.
- Return type:
Union[go.Figure, plt.Figure]
Task Detection#
Task detection utilities for tab-right package.
- class tab_right.task_detection.TaskType(*values)[source]#
Bases:
EnumEnumeration of possible task types for model evaluation.
- BINARY = 'binary'#
- CLASS = 'class'#
- REG = 'reg'#
- tab_right.task_detection.detect_task(y: Series) TaskType[source]#
Detect the type of task (binary, class, regression) based on the label series y.
- Parameters:
y (pd.Series) – The label series to analyze.
- Returns:
The detected task type.
- Return type:
- Raises:
ValueError – If the label column has only one unique value and the task cannot be inferred.