Skip to main content

Stepflow Introduction

Stepflow is a workflow orchestrator for AI applications. You define workflows declaratively in YAML, and Stepflow coordinates execution across workers — handling data flow, parallelism, fault tolerance, and scaling.

The orchestrator and workers are separate processes connected by an open protobuf protocol. This separation is the foundation of Stepflow's architecture: workers can be written in any language, scaled independently, and deployed on hardware matched to their workload — all without changing the workflow definition.

Key Features

  • Combine anything in one workflow. A single workflow can call an LLM, run a Python function, invoke an MCP tool, and execute a component from a different framework — each in its own worker process with no shared runtime or dependency conflicts.

  • Open protocol, any language. Workers communicate over pull-based task queues and gRPC. Build workers with the Python SDK, implement the protocol directly in any language, or use MCP servers as components with zero wrapping code.

  • Dev to production, no workflow changes. Locally, Stepflow runs as a single binary with embedded storage and subprocess workers. In production, the same workflows run on a distributed cluster with dedicated worker pools, persistent storage, and message brokers like NATS — you change the infrastructure, not the workflow.

  • Production-grade by default. Stepflow journals every step result, so workflows resume from the last successful step after a crash. Workers run in isolated processes with independent scaling and resource routing to appropriate hardware.

  • Batch execution. Process thousands of inputs in parallel with configurable concurrency, progress tracking, and fault isolation — locally or on remote servers.

  • Dynamic, composable flows. Workflows can spawn sub-workflows at runtime. The declarative format is simple enough for LLMs to author flows dynamically using a whitelisted set of components — enabling safe agentic patterns.

Architecture

During development, Stepflow manages workers as subprocesses. Everything runs on a single machine.

The orchestrator executes workflows, manages data flow between steps, persists state, and routes tasks to worker pools. Workers pull tasks, execute components, and return results — controlling their own concurrency by choosing when to pull the next task.

Dev to Production

Stepflow scales from a single binary to a distributed cluster — no workflow changes required.

DevelopmentProduction
OrchestratorSingle local binaryCluster with persistent storage
WorkersSubprocessesSeparate containers/nodes
Task routingIn-process task queuesNamed queues (gRPC or NATS)
ScalingSingle machineIndependent worker pool scaling
WorkflowsSame YAMLSame YAML

See Production Deployment for architecture patterns and scaling strategies.

Next Steps

  • Get Started — install Stepflow and run your first workflow
  • Flows — learn the workflow definition language
  • Components — explore built-in and custom components
  • Deployment — production architecture and scaling
  • FAQ — comparisons with other workflow and orchestration technologies