How to Scale Bokeh Applications for Production

Scaling Bokeh applications to support many concurrent users and large data workloads requires robust architectural patterns: multi-process servers, load balancing for WebSockets, container orchestration, and autoscaling. This guide consolidates production strategies and best practices to ensure your Bokeh apps remain responsive and reliable under heavy traffic.

1. Multi-Process Bokeh Server

Run multiple server worker processes to utilize all CPU cores and isolate long-running sessions:

bokeh serve app.py --num-procs 4 --allow-websocket-origin="yourdomain.com"

Explanation: --num-procs starts that many independent Tornado processes sharing the same port via a built-in process proxy. Each process handles sessions separately, improving concurrency.

2. Load Balancing WebSockets

Bokeh server uses WebSocket connections for interactivity. Standard HTTP load balancers must handle sticky sessions or route by path to maintain session affinity.

Load Balancer	Config Pattern	Notes
Nginx	`upstream bokeh_upstream { server 127.0.0.1:5006; server 127.0.0.1:5007; } server { location / { proxy_pass http://bokeh_upstream; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } }`	Enables WebSocket proxying and round-robin. Sticky sessions optional but recommended.
HAProxy	`frontend bokeh_front bind *:80 mode http default_backend bokeh_back backend bokeh_back mode http balance roundrobin server s1 127.0.0.1:5006 check server s2 127.0.0.1:5007 check`	Ensure `option http-server-close` to support WebSockets.

3. Container Orchestration with Docker & Kubernetes

Containerize Bokeh servers and manage with Kubernetes for high availability and autoscaling:

Dockerfile

FROM python:3.11-slim
RUN pip install bokeh
COPY . /app
CMD ["bokeh", "serve", "/app", "--num-procs", "2", \
     "--allow-websocket-origin=*", \
     "--unused-session-lifetime", "600000", \
     "--check-unused-sessions", "120000"]

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bokeh-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: bokeh
  template:
    metadata:
      labels:
        app: bokeh
    spec:
      containers:
      - name: bokeh
        image: yourrepo/bokeh-app:latest
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1"
        ports:
        - containerPort: 5006
---
apiVersion: v1
kind: Service
metadata:
  name: bokeh-service
spec:
  selector:
    app: bokeh
  ports:
  - port: 80
    targetPort: 5006
  type: LoadBalancer

Tip: Use a Horizontal Pod Autoscaler:
kubectl autoscale deployment bokeh-app --cpu-percent=70 --min=3 --max=10

4. Autoscaling Patterns

Configure autoscaling based on CPU, memory, or custom metrics (e.g., active sessions):

CPU-based: Kubernetes HPA as above.
Memory-based: Use Kubernetes custom metrics or Prometheus Adapter to scale when memory usage exceeds thresholds.
Session-based: Expose session count metrics via a Prometheus exporter and scale based on active connections.

5. High-Availability & Zero-Downtime Deployments

Ensure rolling updates without dropping WebSocket connections:

Enable readinessProbe on port 5006 to only route traffic to healthy pods.

Configure rolling update strategy:

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1
    maxSurge: 1

6. Data Sharding & Server-Side Filtering

For very large datasets (10M+ points), shard data across multiple Bokeh servers or use Datashader:

Server-side data APIs: Implement Flask/FastAPI microservices to serve pre-filtered subsets to Bokeh sessions.
Datashader integration: Pre-render large data on the server and send aggregated images to clients to reduce memory pressure.

7. Monitoring & Metrics

Instrument metrics for proactive scaling and debugging:

Bokeh server: Expose Prometheus metrics via bokeh-server --metrics (requires plugin).
Container: Monitor CPU, memory, and network I/O with Prometheus/Grafana.
Application: Log active sessions, request latencies, and WebSocket errors.

Quick Reference: Scaling Checklist

Enable --num-procs to utilize all cores.
Use Nginx/HAProxy for WebSocket load balancing.
Containerize with Docker & Kubernetes, set resource limits.
Configure HPA for CPU/memory or custom metrics autoscaling.
Implement readiness/liveness probes for zero-downtime updates.
Shard large datasets or integrate Datashader.
Monitor metrics and set alerts for memory/CPU spikes.

Pro Tip: Use GitOps (ArgoCD/Flux) for reproducible, declarative deployment pipelines that automatically roll back on failure.