Protocol Overview

Stepflow uses a queue-based task protocol to coordinate work between the orchestrator and workers. The orchestrator dispatches tasks onto named queues; workers pull tasks from those queues, execute them, and report results back. This design decouples the orchestrator from workers, enabling independent scaling, multiple transport backends, and fault-tolerant execution.

Architecture

The orchestrator executes a workflow and determines which component to invoke for each step.
It uses routing rules to map the component path to a plugin, which determines the queue to publish the task on.
A worker listening on that queue picks up the task, executes the component, and reports the result back to the orchestrator.

Queue Backends

Stepflow supports multiple queue backends through its plugin system. All backends share the same task lifecycle — the only difference is how tasks are transported between the orchestrator and workers.

Backend	Plugin Type	Transport	Best For
gRPC	`type: grpc`	`PullTasks` streaming RPC	Single-orchestrator deployments, local development
NATS	`type: nats`	JetStream consumers	Multi-orchestrator, horizontal scaling, durable queues

Future backends can be added by implementing the queue plugin interface.

gRPC Queues

Workers connect to the orchestrator's TasksService and call PullTasks to open a long-lived stream. The orchestrator pushes task assignments through the stream as they become available.

plugins:
  python:
    type: grpc
    queueName: python
    command: uv
    args: ["run", "stepflow_py", "--grpc"]

NATS Queues

Tasks are published to NATS JetStream subjects. Workers run as durable consumers, enabling fan-out across multiple orchestrator instances and surviving restarts without task loss.

plugins:
  python:
    type: nats
    natsUrl: "nats://localhost:4222"
    stream: STEPFLOW_TASKS

Named Queues and Routing

Each plugin defines a default queue name (via queueName). Routes can override this per-path, allowing a single plugin to serve multiple worker pools:

plugins:
  workers:
    type: grpc
    queueName: default

routes:
  "/ml/{*component}":
    - plugin: workers
      params:
        queueName: gpu    # Override: route to GPU worker pool
  "/python/{*component}":
    - plugin: workers      # Uses default queue name
  "/{*component}":
    - plugin: builtin

Workers subscribe to a specific queue name and only receive tasks routed to that queue.

Worker Configuration

Workers discover the orchestrator and queue configuration through environment variables. When the orchestrator launches a worker as a subprocess, it sets these automatically:

Variable	Description
`STEPFLOW_TRANSPORT`	Queue backend: `grpc` or `nats`
`STEPFLOW_QUEUE_NAME`	Queue name to pull tasks from
`STEPFLOW_TASKS_URL`	Orchestrator gRPC address (for gRPC transport)
`STEPFLOW_BLOB_URL`	BlobService gRPC address

For remote workers (deployed independently, e.g., in Kubernetes), these variables are configured in the worker's deployment manifest.

Task Lifecycle

Every task follows the same lifecycle regardless of queue backend:

See Task Lifecycle for the full details on task types, heartbeating, completion, retries, and bidirectional callbacks.

Proto Definitions

The protocol is defined in Protocol Buffer files in stepflow-rs/crates/stepflow-proto/proto/stepflow/v1/:

tasks.proto — TasksService: PullTasks, GetOrchestratorForRun
orchestrator.proto — OrchestratorService: CompleteTask, TaskHeartbeat, SubmitRun, GetRun
components.proto — ComponentsService: ListRegisteredComponents (public API)
blobs.proto — BlobService: PutBlob, GetBlob
common.proto — Shared types: TaskErrorCode, ObservabilityContext

Next Steps

Task Lifecycle — How tasks are dispatched, executed, and completed
Error Handling — Error codes, retry behavior, and failure handling
Blob Storage — Content-addressable storage for data sharing

Architecture​

Queue Backends​

gRPC Queues​

NATS Queues​

Named Queues and Routing​

Worker Configuration​

Task Lifecycle​

Proto Definitions​

Next Steps​