Production Deployment

Stepflow's architecture enables flexible production deployments that scale component execution independently from workflow orchestration. This section covers key concepts and components for deploying Stepflow in production environments.

Overview

In production, Stepflow separates concerns between:

Workflow Orchestrator: Manages workflow execution, data flow, and state persistence
Component Servers: Provide business logic and can be scaled independently
Load Balancer: Routes requests to component servers with SSE support and instance affinity

This separation allows you to:

Scale different types of components independently based on resource requirements
Deploy components on specialized hardware (GPUs, high-memory nodes, etc.)
Maintain simple orchestration while distributing compute-intensive work
Handle high-throughput batch processing efficiently

Architecture Patterns

Resource-Based Component Segregation

Different component servers can be deployed with different resource profiles:

# Configuration routing to different component server classes
plugins:
  builtin:
    type: builtin
  
  # CPU-intensive components
  cpu_components:
    type: stepflow
    url: "http://cpu-components-lb.stepflow.svc.cluster.local:8080"

  # GPU-accelerated components (ML models)
  gpu_components:
    type: stepflow
    url: "http://gpu-components-lb.stepflow.svc.cluster.local:8080"

  # Memory-intensive components (large data processing)
  memory_components:
    type: stepflow
    url: "http://memory-components-lb.stepflow.svc.cluster.local:8080"

routes:
  "/ml/{*component}":
    - plugin: gpu_components
  "/data/{*component}":
    - plugin: memory_components
  "/python/{*component}":
    - plugin: cpu_components
  "/{*component}":
    - plugin: builtin

Deployment Topology

Each component server class:

Has its own load balancer for intelligent routing
Scales independently based on workload
Runs on appropriate hardware (CPU, GPU, high-memory nodes)
Maintains instance affinity for bidirectional communication

Key Components

1. Load Balancer

The Stepflow Load Balancer provides:

SSE-aware load balancing for streaming responses
Instance affinity routing for bidirectional communication
Automatic backend discovery via DNS
Health checking and failover

Use cases:

Distributing requests across multiple component server pods
Maintaining connection affinity for stateful operations
Enabling horizontal scaling of component servers

2. Component Server Classes

Deploy different component servers for different workloads:

CPU-Optimized:

General-purpose Python components
Data transformation and validation
API integrations
Deployment: Standard compute nodes, high replica count

GPU-Accelerated:

ML model inference
Image/video processing
Large language models
Deployment: GPU nodes, fewer replicas, higher cost

Memory-Intensive:

Large dataset processing
In-memory caching
Data aggregation
Deployment: High-memory nodes, moderate replica count

3. Configuration Management

Use Configuration and Variables to manage environment-specific settings:

Configuration controls infrastructure:

Define plugin routes to different component server classes
Configure state storage backends
Set component server connection details

Variables parameterize workflows:

API endpoints and credentials that differ between environments
Feature flags and configuration options
Resource limits and timeouts
Environment identifiers (dev, staging, production)

This separation allows the same workflow definition to run across environments by only changing configuration and variables, not the workflow itself.

Example: Kubernetes Deployment

The Kubernetes Batch Demo provides a complete working example of:

Stepflow orchestrator deployed in Kubernetes
Multiple component server replicas with load balancing
SSE-aware load balancer with instance affinity
Batch execution with distributed compute
Health checking and automatic failover

Key features demonstrated:

Component servers scale from 3 to 20+ replicas
Load balancer distributes requests evenly
Bidirectional communication (blob storage) works correctly
Batch workflows process 1000+ items efficiently

Scaling Strategies

Horizontal Scaling

Scale component servers based on load:

# Kubernetes HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cpu-components
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: cpu-components
  minReplicas: 5
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Resource-Based Routing

Route components to appropriate hardware:

# GPU component server deployment
spec:
  template:
    spec:
      nodeSelector:
        accelerator: nvidia-tesla-v100
      containers:
      - name: gpu-components
        resources:
          limits:
            nvidia.com/gpu: 1

Load Balancer Scaling

Scale load balancers independently:

# Load balancer deployment
spec:
  replicas: 2  # Start with 2 for HA
  # Scale up to 10 based on connection count

State Management

Development

In-memory state store
Single orchestrator instance
Fast, simple, ephemeral

Production

SQLite or PostgreSQL state store
Persistent workflow state
Multiple orchestrator instances (with PostgreSQL)
Durable execution with fault tolerance

See Persistence and Recovery for the full architecture, and Configuration - State Store for configuration options.

Best Practices

1. Separate Component Classes

Group components by resource requirements:

Light: API calls, simple transformations → CPU servers
Medium: Data processing, batch operations → Memory servers
Heavy: ML inference, GPU workloads → GPU servers

2. Use Load Balancers

Deploy load balancers for each component class:

Enables horizontal scaling
Provides health checking
Maintains instance affinity
Simplifies configuration

3. Monitor and Scale

Track key metrics:

Component server CPU/memory usage
Request latency and throughput
Error rates and health status
Queue depths and backpressure

4. Plan for Failures

Design for resilience:

Multiple load balancer replicas
Health checks with automatic failover
Retry logic in workflows
Persistent state storage

Next Steps

Persistence and Recovery - Durable execution and crash recovery
Load Balancer - Detailed load balancer documentation
Configuration - Configure routing and plugins
Variables - Environment-specific workflow parameters
Batch Execution - High-throughput processing patterns
Kubernetes Example - Complete working example

Future Topics

This section will be expanded with:

Multi-region deployments
Service mesh integration
Observability and monitoring
Security and authentication
CI/CD pipelines
Cost optimization strategies

Overview​

Architecture Patterns​

Resource-Based Component Segregation​

Deployment Topology​

Key Components​

1. Load Balancer​

2. Component Server Classes​

3. Configuration Management​

Example: Kubernetes Deployment​

Scaling Strategies​

Horizontal Scaling​

Resource-Based Routing​

Load Balancer Scaling​

State Management​

Development​

Production​

Best Practices​

1. Separate Component Classes​

2. Use Load Balancers​

3. Monitor and Scale​

4. Plan for Failures​

Next Steps​

Future Topics​