Production Deployment
Stepflow's architecture enables flexible production deployments that scale component execution independently from workflow orchestration. This section covers key concepts and components for deploying Stepflow in production environments.
Overview
In production, Stepflow separates concerns between:
- Workflow Orchestrator: Manages workflow execution, data flow, and state persistence
- Workers: Provide business logic and can be scaled independently
- Task Queues: Route tasks to workers via named gRPC queues
This separation allows you to:
- Scale different types of components independently based on resource requirements
- Deploy components on specialized hardware (GPUs, high-memory nodes, etc.)
- Maintain simple orchestration while distributing compute-intensive work
- Handle high-throughput batch processing efficiently
Architecture Patterns
Resource-Based Component Segregation
Different workers can be deployed with different resource profiles:
# Configuration routing to different worker pools using per-route queueName
plugins:
builtin:
type: builtin
workers:
type: grpc
queueName: default # Default queue; overridden per-route below
routes:
"/ml/{*component}":
- plugin: workers
params:
queueName: gpu # GPU worker pool
"/data/{*component}":
- plugin: workers
params:
queueName: memory # High-memory worker pool
"/python/{*component}":
- plugin: workers
params:
queueName: cpu # CPU worker pool
"/{*component}":
- plugin: builtin
Deployment Topology
Each worker pool:
- Pulls tasks from a dedicated named queue
- Scales independently based on workload
- Runs on appropriate hardware (CPU, GPU, high-memory nodes)
- Pulls tasks from the orchestrator and returns results via gRPC
Key Components
1. Task Routing
Workers pull tasks from named queues and return results to the orchestrator via gRPC. The pull-based protocol provides:
- Named queue-based task routing
- Heartbeat-based crash detection
- Automatic retry on transport failures
- Horizontal scaling across multiple worker instances
Use cases:
- Distributing tasks across multiple worker pods
- Scaling worker pools independently
- Enabling horizontal scaling of workers
2. Worker Pools
Deploy different workers for different workloads:
CPU-Optimized:
- General-purpose Python components
- Data transformation and validation
- API integrations
- Deployment: Standard compute nodes, high replica count
GPU-Accelerated:
- ML model inference
- Image/video processing
- Large language models
- Deployment: GPU nodes, fewer replicas, higher cost
Memory-Intensive:
- Large dataset processing
- In-memory caching
- Data aggregation
- Deployment: High-memory nodes, moderate replica count
3. Configuration Management
Use Configuration and Variables to manage environment-specific settings:
Configuration controls infrastructure:
- Define plugin routes to different worker pools
- Configure state storage backends
- Set worker connection details
Variables parameterize workflows:
- API endpoints and credentials that differ between environments
- Feature flags and configuration options
- Resource limits and timeouts
- Environment identifiers (dev, staging, production)
This separation allows the same workflow definition to run across environments by only changing configuration and variables, not the workflow itself.
Example: Kubernetes Deployment
The Kubernetes Batch Demo provides a complete working example of:
- Stepflow orchestrator deployed in Kubernetes
- Multiple worker replicas pulling from named queues
- gRPC-based task dispatch and completion
- Batch execution with distributed compute
- Heartbeat-based health monitoring and automatic failover
Key features demonstrated:
- Workers scale from 3 to 20+ replicas
- Tasks distributed across worker pool via named queues
- Bidirectional communication (sub-run submission) works correctly
- Batch workflows process 1000+ items efficiently
Scaling Strategies
Horizontal Scaling
Scale workers based on load:
# Kubernetes HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cpu-components
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: cpu-components
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Resource-Based Routing
Route components to appropriate hardware:
# GPU worker deployment
spec:
template:
spec:
nodeSelector:
accelerator: nvidia-tesla-v100
containers:
- name: gpu-components
resources:
limits:
nvidia.com/gpu: 1
State Management
Development
- In-memory state store
- Single orchestrator instance
- Fast, simple, ephemeral
Production
- SQLite or PostgreSQL state store
- Persistent workflow state
- Multiple orchestrator instances (with PostgreSQL)
- Durable execution with fault tolerance
See Persistence and Recovery for the full architecture, and Configuration - State Store for configuration options.
Best Practices
1. Separate Component Classes
Group components by resource requirements:
- Light: API calls, simple transformations → CPU workers
- Medium: Data processing, batch operations → Memory workers
- Heavy: ML inference, GPU workloads → GPU workers
2. Use Named Queues
Configure separate named queues for each component class:
- Enables horizontal scaling per worker pool
- Provides heartbeat-based health monitoring
- Automatic crash detection and retry
- Simplifies configuration
3. Monitor and Scale
Track key metrics:
- Worker CPU/memory usage
- Request latency and throughput
- Error rates and health status
- Queue depths and backpressure
4. Plan for Failures
Design for resilience:
- Health checks with automatic failover
- Heartbeat-based crash detection and retry
- Configurable per-step error handling in workflows
- Persistent state storage for crash recovery
Next Steps
- Persistence and Recovery - Durable execution and crash recovery
- Configuration - Configure routing and plugins
- Variables - Environment-specific workflow parameters
- Batch Execution - High-throughput processing patterns
- Kubernetes Example - Complete working example
Learn More
- Read the FAQ for comparisons with other orchestration and workflow technologies
- Learn about the Stepflow Protocol that enables this architecture