Stepflow Introduction
Stepflow is a workflow orchestrator for AI applications. You define workflows declaratively in YAML, and Stepflow coordinates execution across workers — handling data flow, parallelism, fault tolerance, and scaling.
The orchestrator and workers are separate processes connected by an open protobuf protocol. This separation is the foundation of Stepflow's architecture: workers can be written in any language, scaled independently, and deployed on hardware matched to their workload — all without changing the workflow definition.
Key Features
-
Combine anything in one workflow. A single workflow can call an LLM, run a Python function, invoke an MCP tool, and execute a component from a different framework — each in its own worker process with no shared runtime or dependency conflicts.
-
Open protocol, any language. Workers communicate over pull-based task queues and gRPC. Build workers with the Python SDK, implement the protocol directly in any language, or use MCP servers as components with zero wrapping code.
-
Dev to production, no workflow changes. Locally, Stepflow runs as a single binary with embedded storage and subprocess workers. In production, the same workflows run on a distributed cluster with dedicated worker pools, persistent storage, and message brokers like NATS — you change the infrastructure, not the workflow.
-
Production-grade by default. Stepflow journals every step result, so workflows resume from the last successful step after a crash. Workers run in isolated processes with independent scaling and resource routing to appropriate hardware.
-
Batch execution. Process thousands of inputs in parallel with configurable concurrency, progress tracking, and fault isolation — locally or on remote servers.
-
Dynamic, composable flows. Workflows can spawn sub-workflows at runtime. The declarative format is simple enough for LLMs to author flows dynamically using a whitelisted set of components — enabling safe agentic patterns.
Architecture
- Local Development
- Production
During development, Stepflow manages workers as subprocesses. Everything runs on a single machine.
In production, workers run in separate containers or nodes. The orchestrator routes tasks to worker pools via named queues, enabling independent scaling and resource isolation.
The orchestrator executes workflows, manages data flow between steps, persists state, and routes tasks to worker pools. Workers pull tasks, execute components, and return results — controlling their own concurrency by choosing when to pull the next task.
Dev to Production
Stepflow scales from a single binary to a distributed cluster — no workflow changes required.
| Development | Production | |
|---|---|---|
| Orchestrator | Single local binary | Cluster with persistent storage |
| Workers | Subprocesses | Separate containers/nodes |
| Task routing | In-process task queues | Named queues (gRPC or NATS) |
| Scaling | Single machine | Independent worker pool scaling |
| Workflows | Same YAML | Same YAML |
See Production Deployment for architecture patterns and scaling strategies.
Next Steps
- Get Started — install Stepflow and run your first workflow
- Flows — learn the workflow definition language
- Components — explore built-in and custom components
- Deployment — production architecture and scaling
- FAQ — comparisons with other workflow and orchestration technologies