Master three essential Seaborn visualization techniques: create perceptually uniform color palettes with sns.color_palette()
for qualitative, sequential, and diverging data; build statistical boxplots using sns.boxplot()
to show distribution quartiles and outliers; and generate hierarchically-clustered heatmaps with sns.clustermap()
to reveal data patterns through dendrogram-based clustering—all with practical code examples and statistical best practices.
sns.color_palette("viridis")
for sequential data, sns.color_palette("coolwarm")
for diverging data around zero, and sns.color_palette("Set2")
for categorical data; create boxplots with sns.boxplot(data=df, x="category", y="value", hue="group")
to compare distributions; build clustermaps using sns.clustermap(correlation_matrix, annot=True, cmap="coolwarm", center=0)
to visualize hierarchical patterns in correlation or distance matrices.
What this guide covers
- Color palette fundamentals: qualitative, sequential, diverging palettes with perceptual uniformity principles.
- Statistical boxplots: quartile visualization, outlier detection, and multi-category comparisons.
- Hierarchical clustering: clustermap creation, dendrogram interpretation, and correlation matrix visualization.
- Advanced customization: color accessibility, statistical communication, and publication-ready output.
Prerequisites
- Python 3.7+ with seaborn, pandas, matplotlib, and scipy (required for clustermap).
- Understanding of statistical concepts: quartiles, correlation, hierarchical clustering basics.
- Familiarity with pandas DataFrames and matplotlib figure customization.
Step-by-step: Color palettes, boxplots, and clustermaps
1) Master color palette fundamentals
# Essential imports
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Set theme for consistent styling
sns.set_theme(style="whitegrid", context="notebook")
# View current palette
current_palette = sns.color_palette()
sns.palplot(current_palette)
plt.title("Default Seaborn Palette")
plt.show()
Seaborn’s color_palette()
function provides access to perceptually uniform palettes designed for statistical communication.
2) Choose appropriate palette types
# Qualitative palettes for categorical data
categorical_colors = sns.color_palette("Set2", 8)
sns.palplot(categorical_colors)
plt.title("Qualitative: Set2 Palette")
plt.show()
# Sequential palettes for ordered/continuous data
sequential_colors = sns.color_palette("viridis", as_cmap=True)
sns.palplot(sns.color_palette("viridis", 10))
plt.title("Sequential: Viridis Palette")
plt.show()
# Diverging palettes for data with meaningful center point
diverging_colors = sns.color_palette("coolwarm", as_cmap=True)
sns.palplot(sns.color_palette("coolwarm", 11))
plt.title("Diverging: Coolwarm Palette")
plt.show()
Use qualitative for categories, sequential for magnitudes, diverging for deviations from zero—critical for statistical accuracy.
3) Create custom diverging palettes
# Custom diverging palette using husl color space
custom_diverging = sns.diverging_palette(220, 20, as_cmap=True)
sns.palplot(sns.diverging_palette(220, 20, n=11))
plt.title("Custom Diverging: Blue to Red")
plt.show()
# Dark center diverging palette
dark_center = sns.diverging_palette(250, 30, l=65, center="dark", as_cmap=True)
sns.palplot(sns.diverging_palette(250, 30, l=65, center="dark", n=11))
plt.title("Dark Center Diverging Palette")
plt.show()
Custom palettes ensure brand consistency and accessibility compliance while maintaining statistical clarity.
4) Build statistical boxplots
# Load sample data
tips = sns.load_dataset("tips")
# Basic boxplot showing distribution quartiles
plt.figure(figsize=(10, 6))
sns.boxplot(data=tips, x="day", y="total_bill")
plt.title("Total Bill Distribution by Day")
plt.ylabel("Total Bill ($)")
plt.show()
# Advanced boxplot with grouping variable
plt.figure(figsize=(12, 6))
sns.boxplot(data=tips, x="day", y="total_bill", hue="smoker",
palette="Set2")
plt.title("Total Bill Distribution by Day and Smoking Status")
plt.ylabel("Total Bill ($)")
plt.legend(title="Smoker")
plt.show()
Boxplots display five-number summary (min, Q1, median, Q3, max) plus outliers—essential for distribution analysis.
5) Interpret boxplot components
# Detailed boxplot with annotations
fig, ax = plt.subplots(figsize=(8, 6))
box_plot = sns.boxplot(data=tips, y="total_bill", ax=ax)
# Add statistical annotations
ax.text(0.1, tips["total_bill"].quantile(0.75), "Q3 (75th percentile)",
fontsize=10, ha='left')
ax.text(0.1, tips["total_bill"].median(), "Median (Q2)",
fontsize=10, ha='left', weight='bold')
ax.text(0.1, tips["total_bill"].quantile(0.25), "Q1 (25th percentile)",
fontsize=10, ha='left')
plt.title("Boxplot Components: Five-Number Summary")
plt.ylabel("Total Bill ($)")
plt.show()
Box boundaries show Q1/Q3, line shows median, whiskers extend to 1.5×IQR, points beyond are outliers.
6) Create hierarchically-clustered heatmaps
# Prepare correlation matrix for clustering
iris = sns.load_dataset("iris")
numeric_cols = iris.select_dtypes(include=[np.number]).columns
correlation_matrix = iris[numeric_cols].corr()
# Basic clustermap
plt.figure(figsize=(8, 6))
cluster_map = sns.clustermap(correlation_matrix,
annot=True, # Show correlation values
cmap="coolwarm", # Diverging colormap
center=0, # Center at zero
square=True, # Square cells
linewidths=0.5, # Grid lines
cbar_kws={"shrink": 0.8})
plt.suptitle("Iris Dataset: Hierarchical Correlation Clustering")
plt.show()
Clustermaps combine heatmap visualization with dendrogram-based hierarchical clustering to reveal data structure.
7) Advanced clustermap customization
# Advanced clustermap with custom parameters
# Create sample data matrix
data_matrix = np.random.randn(20, 10)
data_df = pd.DataFrame(data_matrix,
columns=[f"Feature_{i}" for i in range(10)],
index=[f"Sample_{i}" for i in range(20)])
# Customized clustermap
cluster_grid = sns.clustermap(data_df,
method='ward', # Linkage method
metric='euclidean', # Distance metric
z_score=1, # Standardize columns
cmap='RdBu_r', # Red-Blue colormap
figsize=(10, 8), # Figure size
dendrogram_ratio=0.15, # Dendrogram size ratio
cbar_pos=(0.02, 0.83, 0.03, 0.15)) # Colorbar position
plt.suptitle("Hierarchical Clustering with Ward Linkage")
plt.show()
Ward linkage minimizes within-cluster variance; z_score standardizes features for fair comparison across scales.
8) Combine techniques for publication graphics
# Publication-ready multi-panel figure
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Panel 1: Palette demonstration
sns.palplot(sns.color_palette("husl", 8))
axes[0,0].set_title("A) HUSL Qualitative Palette")
# Panel 2: Sequential palette
sns.palplot(sns.color_palette("rocket", 8))
axes[0,1].set_title("B) Rocket Sequential Palette")
# Panel 3: Boxplot with custom palette
sns.boxplot(data=tips, x="time", y="tip", hue="sex",
palette="husl", ax=axes[1,0])
axes[1,0].set_title("C) Tip Distribution by Time and Gender")
# Panel 4: Correlation heatmap (no clustering for space)
correlation_matrix = tips.select_dtypes(include=[np.number]).corr()
sns.heatmap(correlation_matrix, annot=True, cmap="rocket",
center=0, ax=axes[1,1])
axes[1,1].set_title("D) Tips Dataset Correlation Matrix")
plt.tight_layout()
plt.show()
Color accessibility and best practices
- Test palettes with colorbrewer2.org for colorblind accessibility—avoid red-green combinations.
- Use perceptually uniform palettes (viridis, plasma) for accurate magnitude representation.
- Apply diverging palettes only when zero/center point has statistical meaning.
- Limit qualitative palettes to 8-10 colors maximum for discriminability.
Statistical interpretation guidelines
- Boxplots: Compare medians, not means; identify outliers for further investigation; use violin plots for distribution shape.
- Clustermaps: Interpret dendrogram height for cluster strength; validate biological/domain significance of clusters.
- Color mapping: Ensure color intensity matches data magnitude; use consistent scales across panels.
Common pitfalls and solutions
- Rainbow colormaps: Avoid jet/rainbow for continuous data—use viridis or similar perceptually uniform alternatives.
- Clustering artifacts: Standardize data with
z_score
parameter to prevent scale-dependent clustering. - Overplotting in boxplots: Use
dodge=True
for multiple categories; consider violin plots for shape information.
FAQ
Question | Answer |
---|---|
Which color palette should I use for correlation matrices? | Use diverging palettes like “coolwarm” or “RdBu_r” centered at zero, since correlations range from -1 to +1 with meaningful zero point. |
How do I interpret clustermap dendrograms? | Dendrogram height indicates dissimilarity—closer branches merge at lower heights (more similar). Cut at desired height to define clusters. |
When should I use boxplots vs violin plots? | Boxplots for outlier identification and quartile comparison; violin plots when distribution shape (bimodality, skewness) matters. |
How do I save high-resolution figures? | Use plt.savefig("figure.png", dpi=300, bbox_inches="tight") or save as vector formats (SVG/PDF) for publications. |
Advanced topics and next steps
- Explore
sns.diverging_palette()
with custom hue parameters for brand-specific palettes. - Combine clustermaps with dimensionality reduction (PCA, t-SNE) for high-dimensional data exploration.
- Integrate with statistical testing frameworks to annotate significant differences in boxplots.
- Master
FacetGrid
andPairGrid
for multi-panel statistical visualizations.