# Serverless Workers

> **Pre-release**
> To request access during Pre-release, create a [support ticket](/cloud/support#support-ticket) or contact your account team. APIs are experimental and may be subject to backwards-incompatible changes. [Sign up for updates](https://temporal.io/pages/serverless-workers-updates) to be notified when Serverless Workers reach Public Preview.

This page covers the following:

- [What is a Serverless Worker?](#serverless-worker)
- [How Serverless invocation works](#how-invocation-works)
- [Autoscaling](#autoscaling)
- [Scaling with long-lived Workers](#scaling-with-long-lived-workers)
- [Worker lifecycle](#worker-lifecycle)
- [Failure handling](#failure-handling)
- [Constraints](#constraints)
- [Compute providers](#compute-providers)

## What is a Serverless Worker? 

A Serverless Worker is a Temporal Worker that runs on serverless compute instead of a long-lived process. There is no
always-on infrastructure to provision or scale. Temporal invokes the Worker when Tasks arrive on a Task Queue, and the
Worker shuts down when the work is done.

A Serverless Worker uses the same Temporal SDKs as a traditional long-lived Worker. It registers Workflows and
Activities the same way. The difference is in the lifecycle: instead of the Worker starting and polling continuously,
Temporal invokes the Serverless Worker on demand, the Worker starts, processes available Tasks, and then shuts down.

Serverless Workers require [Worker Versioning](/worker-versioning). Each Serverless Worker must be associated with a
[Worker Deployment Version](/worker-versioning#deployment-versions) that has a compute provider configured.

To deploy a Serverless Worker, see
[Deploy a Serverless Worker](/production-deployment/worker-deployments/serverless-workers).

## How Serverless invocation works 

With long-lived Workers, you start the Worker process, which connects to Temporal and polls a Task Queue for work.
Temporal does not need to know anything about the Worker's infrastructure.

With Serverless Workers, Temporal starts the Worker.

### Worker Controller Instance 

The Worker Controller Instance (WCI) is a system Workflow that scales Serverless Workers based on Task Queue conditions.
One WCI Workflow runs per Worker Deployment Version that has a compute provider configured. The WCI runs in the same
Namespace as your Worker Deployment.

The WCI responds to two triggers: [sync match failures](#sync-match-failure) and
[Task Queue backlog](#task-queue-backlog). When either trigger fires, the WCI produces a scaling action, such as
invoking the configured compute provider (for example, calling AWS Lambda's `InvokeFunction` API) to start new Workers.
For details on how scaling works, see [Autoscaling](#autoscaling).

You can list WCI Workflows in your Namespace:

```bash
temporal workflow list \
  --namespace <NAMESPACE> \
  --query 'TemporalNamespaceDivision = "TemporalWorkerControllerInstance"'
```

WCI Workflow IDs follow the pattern `temporal-sys-worker-controller-instance:<deployment-name>:<build-id>`. You can
inspect a WCI Workflow's history to see its recent Activity results:

```bash
temporal workflow show \
  --namespace <NAMESPACE> \
  --workflow-id 'temporal-sys-worker-controller-instance:<DEPLOYMENT_NAME>:<BUILD_ID>'
```

The following diagram illustrates the invocation flow of a Serverless Worker.

![Serverless invocation flow](/diagrams/serverless-worker-flow.svg)

The invocation flow works as follows:

1. A Task is submitted (for example, `StartWorkflow` or `ScheduleActivity`).
2. The [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route the Task directly to an
   available Worker (a sync match).
3. If a Worker is available, the Task is routed to that Worker.
4. If no Worker is available (sync match fails), the Matching Service pushes a signal to the WCI, and the WCI invokes
   the configured compute provider.
5. The Serverless Worker starts, creates a Temporal Client, and begins polling the Task Queue.
6. The Worker processes available Tasks until it exits (see [Worker lifecycle](#worker-lifecycle)).

Each invocation is independent. The Worker creates a fresh client connection on every invocation. There is no connection
reuse or shared state across invocations.

## Autoscaling 

The [WCI](#worker-controller-instance) automatically scales Serverless Workers based on Task Queue signals. When Tasks
arrive and no Worker is available, the WCI invokes new Workers. When the Tasks are done, Workers exit and scale to zero.

The WCI uses two signals to decide when to invoke new Workers:

### Sync match failure 

When a Task is submitted, the [Matching Service](/temporal-service/temporal-server#matching-service) attempts to route
it directly to an available Worker. If no Worker is available, the sync match fails, and the Matching Service pushes a
signal to the WCI. The WCI then invokes a new Worker. This is the primary scaling path. Because the Matching Service
pushes match failures to the WCI as they happen rather than the WCI polling on a timer, latency stays low and scaling is
responsive.

### Task Queue backlog 

The WCI monitors Task Queue metadata to determine whether pending Tasks exist without enough Workers to process them. If
there are Tasks on the queue and not enough Workers, the WCI invokes additional Workers.

## Scaling with long-lived Workers 

Serverless Workers can share a Task Queue with long-lived Workers. Because Serverless Workers are only invoked on
[sync match failure](#sync-match-failure), Serverless Workers only pick up Tasks that no long-lived Worker was available
to handle. In practice, the Serverless Workers act as spillover capacity for the long-lived fleet.

> **⚠️ Caution:**
>
> If you configure Serverless and long-lived Workers on the same Task Queue, do not enable dynamic scaling on the
> long-lived Workers. The two groups cannot coordinate their scaling behavior. If both scale dynamically, the long-lived
> Workers may scale up to handle the same Tasks that Temporal is simultaneously invoking Serverless Workers for, leading
> to unnecessary invocations and unpredictable scaling.
>

## Worker lifecycle 

A single Serverless Worker invocation has three phases: init, work, and shutdown.

![Serverless Worker lifecycle](/diagrams/serverless-worker-lifecycle.svg)

During the **init** phase, the Worker initializes and establishes a client connection to Temporal.

During the **work** phase, the Worker polls the Task Queue and processes Tasks.

During the **shutdown** phase, the Worker stops polling, waits for in-flight Tasks to finish, and runs any shutdown
hooks (for example, OpenTelemetry telemetry flushes). Shutdown begins before the invocation deadline so the Worker can
exit cleanly before the compute provider forcibly terminates the execution environment.

### Tuning for long-running Activities

If your Worker handles long-running Activities, set these three values together:

- **Worker stop timeout > longest Activity runtime.** Gives in-flight Activities enough time to finish after polling
  stops.
- **Shutdown deadline buffer > Worker stop timeout + shutdown hook time.** Ensures the drain and any shutdown hooks
  complete before the compute provider terminates the environment.
- **Invocation deadline > longest Activity runtime + shutdown deadline buffer.** Set on the compute provider to give
  each invocation enough total runtime.

> **💡 Tip:**
>
>   If your longest-running Activity runs longer than half the maximum invocation deadline, this constraint may be
>   difficult or impossible to meet. In this case, use
>   [Activity Heartbeats](/encyclopedia/detecting-activity-failures#activity-heartbeat) to record the state of the
>   Activity execution so that the next retry can pick up where it left off.
>

For example, if your longest Activity runtime is 5 minutes, and your shutdown hooks take 3 seconds to run, set the
Worker stop timeout to more than 5 minutes, and the shutdown deadline buffer to more than 303 seconds (5 minutes + 3
seconds). Set your invocation deadline to at least 10 minutes and 3 seconds (5 minutes + 303 seconds).

The Worker stop timeout controls how long the Worker waits for in-flight Tasks to finish after it stops polling. The
shutdown deadline buffer controls how much time before the invocation deadline the Worker stops polling for Tasks.

Raising only the shutdown deadline buffer makes the Worker stop polling earlier, but does not give in-flight Tasks any
more time to complete.

Raising only the Worker stop timeout does not make the Worker stop polling earlier, which means the compute provider
might terminate the Worker before the full stop timeout completes. In-flight Activities then do not get the full stop
timeout to finish, and the shutdown hooks may not run.

## Failure handling 

Serverless Workers rely on Temporal's standard retry and timeout semantics to recover from failures. The following
sections describe common failure scenarios and how they are handled.

### Worker crash 

If a Worker invocation crashes (out of memory, unhandled exception, etc.), the behavior follows standard Temporal retry
semantics:

- The Activity Timeout fires after the configured duration.
- Temporal retries the Activity on a different Worker invocation.
- No manual intervention is required.

### Provider concurrency limit 

If the compute provider's concurrency limit is reached (for example, AWS Lambda account concurrency):

- Further invocations from the WCI fail.
- Tasks remain in the Task Queue backlog. No data loss occurs.
- Processing slows until concurrency frees up.

### Resource exhaustion across Activity slots 

By default, a single Worker invocation may run multiple Activity slots. A crash or resource exhaustion in one Activity
(for example, out-of-memory from a memory-intensive operation) can affect other Activities running in the same
invocation.

To isolate Activities from each other:

- Split Workflow and Activity Workers into separate compute functions.
- Set Activity slots to 1 per invocation.

With single-slot configuration, each Activity gets a dedicated execution environment.

## Constraints 

| Constraint        | Detail                                                                                                                                                             |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Activity duration | Must complete within the compute provider's invocation limit (minus shutdown deadline buffer). For AWS Lambda, the maximum is 15 minutes.                          |
| Workflow duration | No limit. Workflows of any duration work, regardless of the invocation timeout. A Workflow runs across as many invocations as needed.                              |
| Worker code       | Same Temporal SDK Worker code, using the serverless Worker package for your SDK.                                                                                   |
| Versioning        | [Worker Versioning](/worker-versioning) is required. Each Workflow must have an `AutoUpgrade` or `Pinned` behavior, set per-Workflow or as a Worker-level default. See [Worker Versioning with Serverless Workers](#worker-versioning-with-serverless-workers). |

## Worker Versioning with Serverless Workers 

Serverless Workers require [Worker Versioning](/worker-versioning), and the compute provider must invoke a stable,
immutable build for each Worker Deployment Version. With AWS Lambda, this means aligning two versioning systems:

- **Temporal Worker Deployment Versions** — identified by deployment name and Build ID. Each Workflow runs against a
  specific Worker Deployment Version (Pinned) or moves between them on routing changes (Auto-Upgrade).
- **AWS Lambda function versions** — immutable numbered snapshots of your Lambda function code (`1`, `2`, `3`, ...).

A Worker Deployment Version is an immutable build identifier. For production workloads, keep the Lambda function code
it invokes immutable as well: map each Worker Deployment Version to exactly one Lambda function version, and configure
the compute provider with the qualified
[versioned ARN](https://docs.aws.amazon.com/lambda/latest/dg/configuration-versions.html) for that Lambda version (for
example, `arn:aws:lambda:us-east-1:123:function:my-worker:5`).

```mermaid
flowchart LR
    subgraph Temporal["Temporal · Worker Deployment 'my-app'"]
        direction TB
        b1["Build ID: v1"]
        b2["Build ID: v2"]
        b3["Build ID: v3<br/>(Current)"]
    end
    subgraph AWS["AWS Lambda · my-temporal-worker"]
        direction TB
        l1["Lambda version 1<br/><code>function:my-temporal-worker:1</code>"]
        l2["Lambda version 2<br/><code>function:my-temporal-worker:2</code>"]
        l3["Lambda version 3<br/><code>function:my-temporal-worker:3</code>"]
    end
    b1 -- qualified ARN --> l1
    b2 -- qualified ARN --> l2
    b3 -- qualified ARN --> l3
```

> **💡 Tip:**
> Use a qualified versioned ARN in production
>
> An unqualified ARN (no version suffix, such as `arn:aws:lambda:us-east-1:123:function:my-worker`) points at `$LATEST`,
> which changes whenever you redeploy the Lambda. Workflows created against an older Build ID are then invoked against
> newer code, so every code change must remain replay-compatible under standard [patching](/patching) rules — even for
> Workflows you annotated as Pinned. In effect, an unqualified ARN gives up the "Pinned doesn't need patching" guarantee
> and pushes the replay-safety burden onto your team for every Lambda redeploy.
>
> For development, testing, or non-critical workloads where this discipline is acceptable, an unqualified ARN is fine and
> lets you iterate without publishing a new Lambda function version each time. For production, use a qualified versioned
> ARN so that Pinned Workflows remain truly pinned to the code they started against.
>

### How the Versioning Behavior changes rollout 

The choice of Pinned or Auto-Upgrade controls how *Workflows* move between Worker Deployment Versions in Temporal. It
does not change how a Worker Deployment Version targets Lambda — both behaviors use a versioned ARN that points at one
immutable Lambda function version.

| Versioning Behavior | What changes when you roll out new code                                                                                                                                                                                                |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Pinned**          | Publish a new Lambda function version, then create a new Worker Deployment Version that points at it. Existing Workflows continue invoking their original Lambda function version until they complete.                                |
| **Auto-Upgrade**    | Publish a new Lambda function version, create a new Worker Deployment Version that points at it, then move the Current Version. Existing Workflows move to the new Worker Deployment Version (and its new Lambda version) at Workflow Task boundaries. |

For step-by-step instructions on publishing Lambda versions and configuring the compute provider with a versioned ARN,
see [Publish a Lambda function version](/production-deployment/worker-deployments/serverless-workers/aws-lambda#publish-lambda-version).

## Compute providers 

A compute provider is the configuration that tells Temporal how to invoke a Serverless Worker. The compute provider is
set on a [Worker Deployment Version](/worker-versioning#deployment-versions) and specifies the provider type, the
invocation target, and the credentials Temporal needs to trigger the invocation.

For example, an AWS Lambda compute provider includes the Lambda function ARN and the IAM role that Temporal assumes to
invoke the function.

Compute providers are only needed for Serverless Workers. Traditional long-lived Workers do not require a compute
provider because the Worker process lifecycle is not managed by the Temporal server. 

### Supported providers

| Provider   | Description                                                                   |
| ---------- | ----------------------------------------------------------------------------- |
| AWS Lambda | Temporal assumes an IAM role in your AWS account to invoke a Lambda function. |
