Understanding how a variable is distributed is fundamental to exploratory data analysis. Does your data cluster around a central value or spread evenly? Are there multiple peaks suggesting distinct subgroups? Seaborn provides several distribution plotting functions that answer these questions visually. Whether you need a simple histogram or a complex multi-faceted distribution visualization, Seaborn offers elegant solutions with minimal code.
Creating Histograms With histplot
A histogram divides data into bins and counts observations in each bin, showing the frequency distribution visually. Seaborn’s histplot function is flexible and powerful, supporting multiple statistical computations and visual customizations. The classic binned histogram reveals the shape of the distribution at a glance.
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
# Simple histogram
plt.figure(figsize=(10, 6))
sns.histplot(data=tips, x="total_bill", bins=30, kde=False)
plt.title("Distribution of Total Bill Amount")
plt.xlabel("Total Bill ($)")
plt.ylabel("Frequency")
plt.show()
The bins parameter controls the number of bins; more bins show finer detail but risk appearing noisy, while fewer bins smooth the distribution but hide detail. The kde=False parameter disables the kernel density estimate curve (we will add it next), keeping the plot clean. The histogram immediately shows that most bills cluster between 10 and 40 dollars.
Adding Kernel Density Estimation
A histogram with many bins can appear spiky and noisy. Adding a kernel density estimate (KDE) smooths the distribution and reveals the underlying shape. The KDE is a continuous curve that estimates the probability density function from the data.
plt.figure(figsize=(10, 6))
sns.histplot(data=tips, x="total_bill", bins=20, kde=True, stat='density')
plt.title("Distribution of Total Bill Amount with KDE")
plt.xlabel("Total Bill ($)")
plt.ylabel("Density")
plt.show()
With kde=True, Seaborn overlays a smooth KDE curve on the histogram. The stat=’density’ parameter normalizes the histogram so the area under all bars sums to 1, making it a valid probability density. This combination reveals the distribution shape: whether it is unimodal (one peak), bimodal (two peaks), or multimodal (several peaks).
Comparing Distributions With Hue
When you have multiple groups, comparing their distributions visually is powerful. Seaborn’s hue parameter colors different groups, letting you see if distributions differ across categories. This is much faster than computing summary statistics for each group separately.
plt.figure(figsize=(10, 6))
sns.histplot(data=tips, x="total_bill", hue="sex", kde=True, stat='density', bins=20)
plt.title("Bill Distribution by Customer Sex")
plt.xlabel("Total Bill ($)")
plt.ylabel("Density")
plt.legend(title='Sex')
plt.show()
The plot immediately reveals that male customers tend to have higher total bills on average. The two distributions have similar shapes but different centers. This kind of comparison takes seconds visually but would require computing means and standard deviations by group to communicate textually. The visual approach is faster and more intuitive.
Using distplot For Combined Visualizations
Seaborn’s histplot function is the modern choice, but distplot remains popular for quick exploratory work. It combines a histogram, KDE, and rug plot (individual data points) in a single figure, showing the raw data, smoothed density, and binned counts simultaneously.
plt.figure(figsize=(10, 6))
sns.histplot(data=tips, x="total_bill", kde=True, stat='density', bins=25)
# Add rug plot for individual observations
sns.rugplot(data=tips, x="total_bill", height=0.02, alpha=0.5)
plt.title("Distribution with Rug Plot")
plt.xlabel("Total Bill ($)")
plt.ylabel("Density")
plt.show()
The rug plot shows a small tick mark for each individual observation along the x-axis. In dense regions, these marks stack and reveal concentration visually. The rug plot is especially useful for small to medium datasets where showing individual data points adds value without creating visual clutter.
Creating Joint Distributions With jointplot
When you want to examine the relationship between two continuous variables while also showing their individual distributions, jointplot is ideal. It creates a 2D scatter plot in the center with marginal histograms on the edges, revealing both the relationship and each variable’s distribution.
plt.figure()
sns.jointplot(data=tips, x="total_bill", y="tip", kind="scatter")
plt.suptitle("Bill Amount vs Tip with Marginal Distributions", y=1.00)
plt.show()
# With KDE for smooth density representation
sns.jointplot(data=tips, x="total_bill", y="tip", kind="kde")
plt.suptitle("Bill and Tip Relationship with Smooth Density")
plt.show()
The kind=’scatter’ parameter creates a 2D scatter plot, while kind=’kde’ creates a smooth bivariate density. The marginal distributions on the sides show that both total bill and tip are right-skewed (most observations on the left with a long tail). The scatter plot shows a positive relationship: larger bills tend to have larger tips.
Faceting Distributions Across Groups
For comparing distributions across many groups, faceting is clearer than overlapping histograms. Seaborn’s histplot with hue or col parameters handles this, though FacetGrid offers more control for complex layouts.
g = sns.FacetGrid(tips, col="sex", row="time", height=4, aspect=1.2)
g.map(sns.histplot, "total_bill", kde=True, bins=20, stat='density')
plt.show()
This faceted grid shows the distribution of total bills separated by customer sex and meal time. Four subplots let you compare distributions across groups easily. Lunch bills appear lower than dinner bills, and the patterns differ between male and female customers. Faceting prevents the visual clutter that would result from layering four colored histograms in a single plot.
