Plotly Box Plot And Violin Plot: Statistical Distributions

Box plots and violin plots are statistical summaries that reveal distribution shape, central tendency, and outliers. When you add Plotly’s interactivity, these plots become powerful exploration tools where viewers can hover for details, zoom into specific ranges, and compare multiple groups. Unlike static statistical plots, interactive Plotly visualizations invite exploration and deeper understanding.

Understanding Box Plots

A box plot condenses a distribution into five key points: the minimum, first quartile (25th percentile), median (50th percentile), third quartile (75th percentile), and maximum. The box spans the interquartile range (IQR), the whiskers extend to the extremes, and outliers are plotted as individual points. This compact summary makes comparing distributions across groups fast and intuitive.


import plotly.express as px

df = px.data.tips()

# Simple box plot
fig = px.box(df, y="total_bill", title="Distribution of Total Bill")
fig.show()
    

With just three lines of code, you have an interactive box plot. Hover over any part of the box to see the quartile values. The horizontal line inside the box is the median, the box edges are the quartiles, and the whiskers extend to the data range (or to a multiple of the IQR, depending on settings).

See also  Adding Traces to Plotly Charts in Python

Comparing Groups With Box Plots

Box plots shine when comparing distributions across categories. Plotly’s x parameter lets you split by a categorical variable, creating side-by-side boxes that reveal how distributions differ between groups.


# Box plot comparing groups
fig = px.box(
    df,
    x="sex",
    y="total_bill",
    color="time",
    title="Bill Distribution by Sex and Meal Time",
    labels={"total_bill": "Total Bill ($)", "sex": "Customer Sex", "time": "Meal Time"}
)
fig.show()
    

The plot immediately shows that male customers have higher median bills than female customers, and dinner bills are higher than lunch bills. The overlapping boxes for each group let viewers compare multiple distributions without jumping between separate charts. Clicking and dragging to zoom into a range reveals more detail in compressed areas.

Creating Violin Plots

A violin plot combines the box plot summary with a density estimate, showing the full shape of the distribution. The width of the “violin” at each point represents the probability density, so wide sections indicate concentration of data. Violin plots are more informative than box plots when distributions have interesting shapes, like multiple peaks.


# Violin plot
fig = px.violin(
    df,
    x="sex",
    y="total_bill",
    color="time",
    box=True,
    points=False,
    title="Bill Distribution: Violin Plot with Box"
)
fig.show()
    

The box=True parameter overlays a small box plot inside each violin, combining both summaries. The points=False parameter hides individual data points for a cleaner look. If you have a smaller dataset, set points=’all’ to show every observation as a dot within the violin, revealing clustering and gaps.

See also  How to add vertical line in Plotly

Side-By-Side Violin And Box Comparison

Comparing a violin plot and box plot side-by-side shows how each visualization reveals different aspects of the data. The violin plot shows the full distribution shape while the box plot highlights outliers and quartiles.


from plotly.subplots import make_subplots

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=("Box Plot", "Violin Plot")
)

# Box plot
fig.add_trace(
    px.box(df, y="total_bill").data[0],
    row=1, col=1
)

# Violin plot
fig.add_trace(
    px.violin(df, y="total_bill").data[0],
    row=1, col=2
)

fig.update_yaxes(title_text="Total Bill ($)", row=1, col=1)
fig.update_layout(showlegend=False, height=500)
fig.show()
    

Side-by-side comparison reveals complementary insights. The box plot clearly shows the median and quartiles, while the violin plot shows that the distribution is right-skewed with a concentration of smaller bills and a tail of larger bills. Together, they provide a complete picture of the data distribution.

See also  Setting Background Color to Transparent in Plotly Plots

Adding Annotations And Custom Styling

Plotly’s box and violin plots support extensive customization through hover templates and styling. You can highlight specific distributions, add custom colors, and include additional metrics in the hover information.


fig = px.box(
    df,
    x="day",
    y="total_bill",
    color="sex",
    category_orders={"day": ["Thurs", "Fri", "Sat", "Sun"]},
    title="Total Bill Distribution by Day and Sex",
    height=600
)

fig.update_layout(
    xaxis_title="Day of Week",
    yaxis_title="Total Bill ($)",
    font=dict(size=12),
    plot_bgcolor="rgba(240, 240, 240, 0.9)",
    paper_bgcolor="white"
)

fig.show()
    

The category_orders parameter ensures days appear in logical order (Thursday through Sunday) rather than alphabetical. Custom styling makes the plot professional and suitable for presentations. The light gray background draws focus to the data while remaining subtle. These details transform a functional plot into a polished communication tool.