Scaling Bokeh applications to support many concurrent users and large data workloads requires robust architectural patterns: multi-process servers, load balancing for WebSockets, container orchestration, and autoscaling. This guide consolidates production strategies and best practices to ensure your Bokeh apps remain responsive and reliable under heavy traffic.
1. Multi-Process Bokeh Server
Run multiple server worker processes to utilize all CPU cores and isolate long-running sessions:
bokeh serve app.py --num-procs 4 --allow-websocket-origin="yourdomain.com"
Explanation: --num-procs
starts that many independent Tornado processes sharing the same port via a built-in process proxy. Each process handles sessions separately, improving concurrency.
2. Load Balancing WebSockets
Bokeh server uses WebSocket connections for interactivity. Standard HTTP load balancers must handle sticky sessions or route by path to maintain session affinity.
Load Balancer | Config Pattern | Notes |
---|---|---|
Nginx |
|
Enables WebSocket proxying and round-robin. Sticky sessions optional but recommended. |
HAProxy |
|
Ensure option http-server-close to support WebSockets. |
3. Container Orchestration with Docker & Kubernetes
Containerize Bokeh servers and manage with Kubernetes for high availability and autoscaling:
Dockerfile
FROM python:3.11-slim RUN pip install bokeh COPY . /app CMD ["bokeh", "serve", "/app", "--num-procs", "2", \ "--allow-websocket-origin=*", \ "--unused-session-lifetime", "600000", \ "--check-unused-sessions", "120000"]
Kubernetes Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: bokeh-app spec: replicas: 3 selector: matchLabels: app: bokeh template: metadata: labels: app: bokeh spec: containers: - name: bokeh image: yourrepo/bokeh-app:latest resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1" ports: - containerPort: 5006 --- apiVersion: v1 kind: Service metadata: name: bokeh-service spec: selector: app: bokeh ports: - port: 80 targetPort: 5006 type: LoadBalancer
kubectl autoscale deployment bokeh-app --cpu-percent=70 --min=3 --max=10
4. Autoscaling Patterns
Configure autoscaling based on CPU, memory, or custom metrics (e.g., active sessions):
- CPU-based: Kubernetes HPA as above.
- Memory-based: Use Kubernetes custom metrics or Prometheus Adapter to scale when memory usage exceeds thresholds.
- Session-based: Expose session count metrics via a Prometheus exporter and scale based on active connections.
5. High-Availability & Zero-Downtime Deployments
Ensure rolling updates without dropping WebSocket connections:
- Enable
readinessProbe
on port 5006 to only route traffic to healthy pods. - Configure rolling update strategy:
strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 maxSurge: 1
6. Data Sharding & Server-Side Filtering
For very large datasets (10M+ points), shard data across multiple Bokeh servers or use Datashader:
- Server-side data APIs: Implement Flask/FastAPI microservices to serve pre-filtered subsets to Bokeh sessions.
- Datashader integration: Pre-render large data on the server and send aggregated images to clients to reduce memory pressure.
7. Monitoring & Metrics
Instrument metrics for proactive scaling and debugging:
- Bokeh server: Expose Prometheus metrics via
bokeh-server --metrics
(requires plugin). - Container: Monitor CPU, memory, and network I/O with Prometheus/Grafana.
- Application: Log active sessions, request latencies, and WebSocket errors.
Quick Reference: Scaling Checklist
- Enable
--num-procs
to utilize all cores. - Use Nginx/HAProxy for WebSocket load balancing.
- Containerize with Docker & Kubernetes, set resource limits.
- Configure HPA for CPU/memory or custom metrics autoscaling.
- Implement readiness/liveness probes for zero-downtime updates.
- Shard large datasets or integrate Datashader.
- Monitor metrics and set alerts for memory/CPU spikes.