Double Segmentation#

This page demonstrates how to use tab-right’s double segmentation functionality to analyze model performance across combinations of two features.

What is Double Segmentation?#

Double segmentation allows you to analyze how your model performs across different combinations of two features. This is useful for:

Identifying feature interactions affecting model performance
Finding specific feature value combinations where your model underperforms
Understanding complex patterns single-feature analysis might miss

Tab-right’s Double Segmentation Tools#

Tab-right provides these tools for double segmentation analysis:

DoubleSegmentationImp - Main class for performing double segmentation
DoubleSegmPlotting - Visualization with support for both interactive Plotly and static Matplotlib backends

Basic Usage with Continuous Features#

Here’s a simple example of double segmentation with tab-right using continuous features:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from tab_right.segmentations import DoubleSegmentationImp
from tab_right.plotting import DoubleSegmPlotting

# Create sample data
np.random.seed(42)
n_samples = 500

# Generate features and target
feature1 = np.random.normal(0, 1, n_samples)
feature2 = np.random.normal(0, 1, n_samples)

# Target with interaction effect
target = 2 + feature1 + feature2 + 2 * (feature1 * feature2) + np.random.normal(0, 1, n_samples)

# Prediction missing the interaction term
prediction = 2 + feature1 + feature2 + np.random.normal(0, 1, n_samples)

# Create DataFrame
df = pd.DataFrame({
    'feature1': feature1,
    'feature2': feature2,
    'target': target,
    'prediction': prediction
})

# Perform double segmentation
double_seg = DoubleSegmentationImp(
    df=df,
    label_col='target',
    prediction_col='prediction'
)

# Apply segmentation with 3 bins for each feature
result_df = double_seg(
    feature1_col='feature1',
    feature2_col='feature2',
    score_metric=mean_squared_error,
    bins_1=3,
    bins_2=3
)

# Visualize results with a heatmap
plotter = DoubleSegmPlotting(df=result_df, backend="matplotlib")
fig = plotter.plot_heatmap()
plt.title("MSE by Feature1 and Feature2 Segments")

(Source code, png, hires.png, pdf)

Working with Categorical Features#

Double segmentation works with categorical features without needing to specify bins:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from tab_right.segmentations import DoubleSegmentationImp
from tab_right.plotting import DoubleSegmPlotting

# Create sample categorical data
np.random.seed(42)
n = 800

# Generate categorical features with non-uniform distributions
category1 = np.random.choice(
    ['A', 'B', 'C', 'D'],
    n,
    p=[0.4, 0.3, 0.2, 0.1]  # Different probabilities for each category
)
category2 = np.random.choice(
    ['X', 'Y', 'Z'],
    n,
    p=[0.5, 0.3, 0.2]
)

# Generate target with different patterns for combinations
target = np.zeros(n, dtype=int)

# Add different effects for different combinations
target[(category1 == 'A') & (category2 == 'X')] = 1
target[(category1 == 'B') & (category2 == 'Y')] = 1
target[(category1 == 'C') & (category2 == 'Z')] = 1
# Special case with stronger effect
target[(category1 == 'D') & (category2 == 'Z')] = np.random.binomial(1, 0.8, np.sum((category1 == 'D') & (category2 == 'Z')))

# Add some noise
noise_mask = np.random.choice([True, False], n, p=[0.1, 0.9])
target[noise_mask] = 1 - target[noise_mask]

# Simple prediction without capturing all patterns
prediction = np.zeros(n, dtype=int)
prediction[category1 == 'A'] = 1
prediction[category2 == 'Z'] = 1

# Create DataFrame
cat_df = pd.DataFrame({
    'category1': category1,
    'category2': category2,
    'target': target,
    'prediction': prediction
})

# Perform double segmentation
cat_seg = DoubleSegmentationImp(
    df=cat_df,
    label_col='target',
    prediction_col='prediction'
)

# Apply segmentation (no bins needed for categorical features)
cat_results = cat_seg(
    feature1_col='category1',
    feature2_col='category2',
    score_metric=accuracy_score
)

# Plot with higher is better for accuracy
cat_plot = DoubleSegmPlotting(
    df=cat_results,
    lower_is_better=False,
    backend="matplotlib"
)
fig = cat_plot.plot_heatmap()
plt.title("Accuracy by Category Segments")

(Source code, png, hires.png, pdf)

Mixed Categorical and Continuous Features#

Double segmentation can analyze combinations of categorical and continuous features:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score
from tab_right.segmentations import DoubleSegmentationImp
from tab_right.plotting import DoubleSegmPlotting

# Create sample data with mixed feature types
np.random.seed(42)
n_samples = 500

# Generate categorical feature - product type
product_types = ['Basic', 'Standard', 'Premium', 'Enterprise']
product = np.random.choice(product_types, n_samples, p=[0.4, 0.3, 0.2, 0.1])

# Generate continuous feature - customer spending
spending = np.random.gamma(shape=5, scale=20, size=n_samples)

# Add variation by product type
spending[product == 'Premium'] *= 1.5
spending[product == 'Enterprise'] *= 2.0

# Simple model: customers return if they have premium products OR spend a lot
premium_mask = np.logical_or(product == 'Premium', product == 'Enterprise')
return_prob = 0.2 + 0.3 * premium_mask + 0.4 * (spending > np.percentile(spending, 70))
return_prob = np.clip(return_prob, 0.1, 0.9)

# Generate actual returns (target)
customer_return = np.random.binomial(1, return_prob)

# Simple prediction (missing some patterns)
pred_prob = 0.2 + 0.4 * (product == 'Enterprise') + 0.3 * (spending > np.percentile(spending, 80))
pred_prob = np.clip(pred_prob, 0.1, 0.9)
prediction = np.random.binomial(1, pred_prob)

# Create DataFrame
mixed_df = pd.DataFrame({
    'product': product,
    'spending': spending,
    'target': customer_return,
    'prediction': prediction
})

# Perform double segmentation
mixed_seg = DoubleSegmentationImp(
    df=mixed_df,
    label_col='target',
    prediction_col='prediction'
)

# Apply segmentation
mixed_results = mixed_seg(
    feature1_col='product',
    feature2_col='spending',
    score_metric=f1_score,
    bins_2=4  # 4 bins for spending
)

# Plot with higher is better for F1 score
mixed_plot = DoubleSegmPlotting(
    df=mixed_results,
    lower_is_better=False,
    backend="matplotlib"
)
fig = mixed_plot.plot_heatmap()
plt.title("F1 Score by Product Type and Spending")

(Source code, png, hires.png, pdf)

Interactive Visualization with Plotly#

Tab-right also offers interactive Plotly visualization:

from tab_right.plotting import DoubleSegmPlotting

# Create interactive visualization from the results
interactive_plot = DoubleSegmPlotting(df=result_df)
fig = interactive_plot.plot_heatmap()
fig.update_layout(title="Interactive Double Segmentation Heatmap")
fig.show()

Using Different Metrics#

You can use any metric compatible with scikit-learn:

from sklearn.metrics import mean_absolute_error, r2_score

# Using MAE instead of MSE
mae_results = double_seg(
    feature1_col='feature1',
    feature2_col='feature2',
    score_metric=mean_absolute_error,
    bins_1=3,
    bins_2=3
)

# For metrics where higher is better (like R²)
r2_results = double_seg(
    feature1_col='feature1',
    feature2_col='feature2',
    score_metric=r2_score,
    bins_1=3,
    bins_2=3
)

# Visualize with appropriate settings
r2_plotter = DoubleSegmPlotting(df=r2_results, lower_is_better=False, backend="matplotlib")
r2_plotter.plot_heatmap()
plt.title("R² Score by Feature Segments")

Key Features of Double Segmentation#

Discover interactions: Find how combinations of features affect performance
Automatic handling: Works with both numerical and categorical features
Flexible metrics: Compatible with any scikit-learn metric
Visual insights: Interactive and static visualization options
Performance diagnosis: Quickly identify problem areas in your model

Double segmentation provides deeper insights than single-feature analysis, helping you better understand your model’s behavior across different data segments.

Double Segmentation#

What is Double Segmentation?#

Tab-right’s Double Segmentation Tools#

Basic Usage with Continuous Features#

Working with Categorical Features#

Mixed Categorical and Continuous Features#

Interactive Visualization with Plotly#

Using Different Metrics#

Key Features of Double Segmentation#

This Page