Production Deployment

Stepflow's architecture enables flexible production deployments that scale component execution independently from workflow orchestration. This section covers key concepts and components for deploying Stepflow in production environments.

Overview

In production, Stepflow separates concerns between:

Workflow Orchestrator: Manages workflow execution, data flow, and state persistence
Workers: Provide business logic and can be scaled independently
Task Queues: Route tasks to workers via named gRPC queues

This separation allows you to:

Scale different types of components independently based on resource requirements
Deploy components on specialized hardware (GPUs, high-memory nodes, etc.)
Maintain simple orchestration while distributing compute-intensive work
Handle high-throughput batch processing efficiently

Architecture Patterns

Resource-Based Component Segregation

Different workers can be deployed with different resource profiles:

# Configuration routing to different worker pools using per-route queueName
plugins:
  builtin:
    type: builtin

  workers:
    type: grpc
    queueName: default  # Default queue; overridden per-route below

routes:
  "/ml/{*component}":
    - plugin: workers
      params:
        queueName: gpu        # GPU worker pool
  "/data/{*component}":
    - plugin: workers
      params:
        queueName: memory     # High-memory worker pool
  "/python/{*component}":
    - plugin: workers
      params:
        queueName: cpu        # CPU worker pool
  "/{*component}":
    - plugin: builtin

Deployment Topology

Each worker pool:

Pulls tasks from a dedicated named queue
Scales independently based on workload
Runs on appropriate hardware (CPU, GPU, high-memory nodes)
Pulls tasks from the orchestrator and returns results via gRPC

Key Components

1. Task Routing

Workers pull tasks from named queues and return results to the orchestrator via gRPC. The pull-based protocol provides:

Named queue-based task routing
Heartbeat-based crash detection
Automatic retry on transport failures
Horizontal scaling across multiple worker instances

Use cases:

Distributing tasks across multiple worker pods
Scaling worker pools independently
Enabling horizontal scaling of workers

2. Worker Pools

Deploy different workers for different workloads:

CPU-Optimized:

General-purpose Python components
Data transformation and validation
API integrations
Deployment: Standard compute nodes, high replica count

GPU-Accelerated:

ML model inference
Image/video processing
Large language models
Deployment: GPU nodes, fewer replicas, higher cost

Memory-Intensive:

Large dataset processing
In-memory caching
Data aggregation
Deployment: High-memory nodes, moderate replica count

3. Configuration Management

Use Configuration and Variables to manage environment-specific settings:

Configuration controls infrastructure:

Define plugin routes to different worker pools
Configure state storage backends
Set worker connection details

Variables parameterize workflows:

API endpoints and credentials that differ between environments
Feature flags and configuration options
Resource limits and timeouts
Environment identifiers (dev, staging, production)

This separation allows the same workflow definition to run across environments by only changing configuration and variables, not the workflow itself.

Example: Kubernetes Deployment

The Kubernetes Batch Demo provides a complete working example of:

Stepflow orchestrator deployed in Kubernetes
Multiple worker replicas pulling from named queues
gRPC-based task dispatch and completion
Batch execution with distributed compute
Heartbeat-based health monitoring and automatic failover

Key features demonstrated:

Workers scale from 3 to 20+ replicas
Tasks distributed across worker pool via named queues
Bidirectional communication (sub-run submission) works correctly
Batch workflows process 1000+ items efficiently

Scaling Strategies

Horizontal Scaling

Scale workers based on load:

# Kubernetes HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu-components
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-components
  minReplicas: 5
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Resource-Based Routing

Route components to appropriate hardware:

# GPU worker deployment
spec:
  template:
    spec:
      nodeSelector:
        accelerator: nvidia-tesla-v100
      containers:
      - name: gpu-components
        resources:
          limits:
            nvidia.com/gpu: 1

State Management

Development

In-memory state store
Single orchestrator instance
Fast, simple, ephemeral

Production

SQLite or PostgreSQL state store
Persistent workflow state
Multiple orchestrator instances (with PostgreSQL)
Durable execution with fault tolerance

See Persistence and Recovery for the full architecture, and Configuration - State Store for configuration options.

Best Practices

1. Separate Component Classes

Group components by resource requirements:

Light: API calls, simple transformations → CPU workers
Medium: Data processing, batch operations → Memory workers
Heavy: ML inference, GPU workloads → GPU workers

2. Use Named Queues

Configure separate named queues for each component class:

Enables horizontal scaling per worker pool
Provides heartbeat-based health monitoring
Automatic crash detection and retry
Simplifies configuration

3. Monitor and Scale

Track key metrics:

Worker CPU/memory usage
Request latency and throughput
Error rates and health status
Queue depths and backpressure

4. Plan for Failures

Design for resilience:

Health checks with automatic failover
Heartbeat-based crash detection and retry
Configurable per-step error handling in workflows
Persistent state storage for crash recovery

Next Steps

Persistence and Recovery - Durable execution and crash recovery
Configuration - Configure routing and plugins
Variables - Environment-specific workflow parameters
Batch Execution - High-throughput processing patterns
Kubernetes Example - Complete working example

Learn More

Read the FAQ for comparisons with other orchestration and workflow technologies
Learn about the Stepflow Protocol that enables this architecture

Overview​

Architecture Patterns​

Resource-Based Component Segregation​

Deployment Topology​

Key Components​

1. Task Routing​

2. Worker Pools​

3. Configuration Management​

Example: Kubernetes Deployment​

Scaling Strategies​

Horizontal Scaling​

Resource-Based Routing​

State Management​

Development​

Production​

Best Practices​

1. Separate Component Classes​

2. Use Named Queues​

3. Monitor and Scale​

4. Plan for Failures​

Next Steps​

Learn More​