Master Seaborn’s most powerful visualization techniques: create multi-panel statistical visualizations with FacetGrid
using sns.FacetGrid(data, col="category", row="group")
to compare conditional relationships across subsets; build comprehensive regression analysis with sns.lmplot(data=df, x="variable1", y="variable2", hue="group", col="condition")
combining scatterplots with fitted regression lines; and leverage advanced features like polynomial regression, robust fitting, and confidence intervals for publication-quality statistical graphics that reveal complex data patterns.
sns.FacetGrid(data, col="category", row="group")
then g.map(plt.scatter, "x", "y")
to create small-multiple plots showing conditional relationships; build regression plots with sns.lmplot(data=df, x="var1", y="var2", hue="group")
for automatic regression fitting with confidence intervals; customize with col_wrap
, height
, aspect
parameters and add statistical elements like order=2
for polynomial fits or robust=True
for outlier-resistant regression.
What this comprehensive guide covers
- FacetGrid fundamentals: multi-panel layouts, conditional visualization, and small-multiple graphics for statistical comparison.
- Advanced lmplot techniques: linear and polynomial regression, robust fitting, logistic regression, and confidence interval customization.
- Statistical communication: publication-ready formatting, error visualization, and best practices for regression analysis.
- Integration patterns: combining FacetGrid with custom functions and matplotlib for advanced statistical graphics.
Prerequisites
- Python 3.7+ with seaborn, pandas, matplotlib, and numpy installed.
- Understanding of statistical concepts: regression, correlation, confidence intervals, and hypothesis testing.
- Familiarity with pandas DataFrame operations and basic matplotlib customization.
Step-by-step: FacetGrid and regression mastery
1) Understanding FacetGrid architecture
# Essential imports and setup
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Set publication-ready theme
sns.set_theme(style="whitegrid", context="paper",
palette="deep", font_scale=1.1)
# Load sample data
tips = sns.load_dataset("tips")
print(tips.head())
print(tips.info())
FacetGrid creates a grid of subplots where each panel shows a subset of data defined by categorical variables—essential for conditional analysis.
2) Basic FacetGrid patterns
# Simple column faceting
g = sns.FacetGrid(tips, col="time", height=4, aspect=1.2)
g.map(plt.scatter, "total_bill", "tip", alpha=0.7)
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.set_titles("Time: {col_name}")
plt.show()
# Row and column faceting
g = sns.FacetGrid(tips, col="day", row="time",
margin_titles=True, height=3)
g.map(plt.scatter, "total_bill", "tip", alpha=0.6, s=30)
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.set_titles(col_template="{col_name} meals",
row_template="{row_name}")
plt.show()
Use margin_titles=True
for cleaner subplot titles; height
and aspect
control individual subplot dimensions.
3) Advanced FacetGrid with hue mapping
# Triple conditioning: col, row, and hue
g = sns.FacetGrid(tips, col="day", hue="smoker",
palette="Set1", height=4, aspect=0.8)
g.map(plt.scatter, "total_bill", "tip", alpha=0.7, s=50)
g.add_legend(title="Smoker Status")
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.set(xlim=(0, 55), ylim=(0, 12))
plt.show()
# Custom mapping with multiple plot types
g = sns.FacetGrid(tips, col="time", row="sex",
margin_titles=True, height=3.5)
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip",
hue="smoker", alpha=0.7)
g.map_dataframe(sns.regplot, x="total_bill", y="tip",
scatter=False, color="red", ax=plt.gca)
g.add_legend()
plt.show()
Combine multiple plot types on the same FacetGrid by calling map
or map_dataframe
multiple times—powerful for layered analysis.
4) Basic lmplot for regression analysis
# Simple linear regression with confidence interval
sns.lmplot(data=tips, x="total_bill", y="tip",
height=5, aspect=1.3, ci=95)
plt.title("Linear Regression: Tip vs Total Bill")
plt.show()
# Regression by categorical groups
sns.lmplot(data=tips, x="total_bill", y="tip", hue="smoker",
height=5, aspect=1.3, palette="Set1")
plt.title("Regression Analysis by Smoking Status")
plt.show()
lmplot automatically fits regression lines with confidence intervals; default 95% CI enables “inference by eye” for statistical significance.
5) Advanced regression techniques
# Polynomial regression
sns.lmplot(data=tips, x="total_bill", y="tip",
order=2, # Quadratic polynomial
height=5, aspect=1.3,
scatter_kws={"alpha": 0.6})
plt.title("Polynomial Regression (Order 2)")
plt.show()
# Robust regression (outlier-resistant)
sns.lmplot(data=tips, x="total_bill", y="tip",
robust=True, # Uses iteratively reweighted least squares
height=5, aspect=1.3,
line_kws={"color": "red", "linewidth": 2})
plt.title("Robust Linear Regression")
plt.show()
# Logistic regression for binary outcomes
# Create binary tip indicator (high vs low tip)
tips['high_tip'] = (tips['tip'] > tips['tip'].median()).astype(int)
sns.lmplot(data=tips, x="total_bill", y="high_tip",
logistic=True, y_jitter=0.03, height=5)
plt.title("Logistic Regression: High Tip Probability")
plt.show()
Use order=2
or higher for polynomial fits; robust=True
for outlier-resistant fitting; logistic=True
for binary outcomes.
6) Combining FacetGrid and regression
# Regression across multiple conditions
sns.lmplot(data=tips, x="total_bill", y="tip",
col="time", hue="smoker",
height=4, aspect=1.2, ci=95,
palette="Set2")
plt.show()
# Advanced faceted regression with custom parameters
sns.lmplot(data=tips, x="total_bill", y="tip",
col="day", col_wrap=2, # Wrap columns for better layout
hue="sex", markers=["o", "s"], # Different markers by sex
height=4, aspect=1.1,
robust=True, # Robust regression
scatter_kws={"alpha": 0.7, "s": 40})
plt.show()
lmplot is essentially regplot + FacetGrid, providing automatic faceting with regression fitting—ideal for conditional regression analysis.
7) Statistical interpretation and diagnostics
# Residual analysis with FacetGrid
# First, create residuals manually for demonstration
from scipy import stats
import matplotlib.pyplot as plt
# Fit model and calculate residuals
slope, intercept, r_value, p_value, std_err = stats.linregress(tips['total_bill'], tips['tip'])
tips['predicted'] = slope * tips['total_bill'] + intercept
tips['residuals'] = tips['tip'] - tips['predicted']
# Residual plots by category
g = sns.FacetGrid(tips, col="time", row="smoker",
height=3, aspect=1.2)
g.map(plt.scatter, "predicted", "residuals", alpha=0.6)
g.map(plt.axhline, y=0, color="red", linestyle="--")
g.set_axis_labels("Predicted Tip", "Residuals")
g.set_titles("{row_name} - {col_name}")
plt.show()
# R-squared and correlation by group
correlation_results = tips.groupby(['time', 'smoker']).apply(
lambda x: pd.Series({
'correlation': x['total_bill'].corr(x['tip']),
'r_squared': x['total_bill'].corr(x['tip'])**2,
'n_obs': len(x)
})
).reset_index()
print("Correlation Analysis by Group:")
print(correlation_results)
Always validate regression assumptions through residual analysis; examine R² values and correlation strength across subgroups for model adequacy.
8) Publication-ready multi-panel figures
# Complex statistical visualization combining techniques
fig = plt.figure(figsize=(14, 10))
# Panel 1: Overall regression
plt.subplot(2, 3, 1)
sns.regplot(data=tips, x="total_bill", y="tip",
scatter_kws={"alpha": 0.6}, line_kws={"color": "red"})
plt.title("A) Overall Relationship")
# Panel 2: By smoking status
plt.subplot(2, 3, 2)
sns.regplot(data=tips[tips.smoker=='Yes'], x="total_bill", y="tip",
label="Smokers", scatter_kws={"alpha": 0.6})
sns.regplot(data=tips[tips.smoker=='No'], x="total_bill", y="tip",
label="Non-smokers", scatter_kws={"alpha": 0.6})
plt.title("B) By Smoking Status")
plt.legend()
# Panel 3: Polynomial fit
plt.subplot(2, 3, 3)
sns.regplot(data=tips, x="total_bill", y="tip", order=2,
scatter_kws={"alpha": 0.6})
plt.title("C) Polynomial Fit (Order 2)")
# Panels 4-6: Small multiples with FacetGrid approach
# Create separate plots for different days
days = tips['day'].unique()
for i, day in enumerate(days[:3]):
plt.subplot(2, 3, 4+i)
day_data = tips[tips.day == day]
sns.regplot(data=day_data, x="total_bill", y="tip",
scatter_kws={"alpha": 0.7})
plt.title(f"D{i+1}) {day}")
plt.tight_layout()
plt.show()
Best practices for statistical communication
- Always show confidence intervals (CI=95 default) to convey uncertainty and enable statistical inference.
- Use robust regression when outliers are present; examine residual plots to validate model assumptions.
- Limit facets to meaningful comparisons—too many panels reduce individual plot readability.
- Include sample sizes in titles or captions when comparing groups with different n-values.
Common pitfalls and solutions
- Overplotting in dense data: Use
alpha
transparency,s=smaller_size
, or switch to hexbin plots for large datasets. - Scale differences across facets: Use
sharex=True, sharey=True
(default) for fair comparisons; setFalse
only when scales differ meaningfully. - Misleading polynomial fits: Validate with cross-validation; higher-order polynomials can overfit to noise.
FAQ
Question | Answer |
---|---|
How do I save multi-panel FacetGrid figures? | Use g.savefig("filename.png", dpi=300, bbox_inches="tight") after creating the FacetGrid. The entire grid saves as one image. |
What’s the difference between regplot and lmplot? | regplot is axes-level (plots on existing axes); lmplot is figure-level (creates new figure with optional faceting). Use lmplot for multi-panel regression analysis. |
How do I interpret confidence intervals in regression plots? | 95% CI (default) means if you repeated the study 100 times, 95 intervals would contain the true regression line. Non-overlapping CIs suggest statistically significant differences. |
Can I customize FacetGrid beyond basic mapping? | Yes—access individual axes with g.axes_dict or g.axes for matplotlib-level customization. Use g.map for simple functions, g.map_dataframe for seaborn functions. |
Advanced integration patterns
- Combine FacetGrid with custom statistical functions using
g.map
for specialized analysis beyond built-in seaborn functions. - Integrate with statistical libraries (scipy.stats, statsmodels) for hypothesis testing and advanced model diagnostics.
- Export individual facets programmatically using
g.axes_dict
for detailed customization and annotation. - Layer multiple plot types (scatter + regression + confidence ellipses) for comprehensive relationship visualization.