---
url: /docs/sync/guides/upgrading.md
description: How to upgrade the Electric sync service with minimal disruption.
---

# Upgrading - Guide

# Upgrading

How to upgrade the [Electric sync engine](/sync/) with minimal disruption using rolling deployments. This guide covers two deployment scenarios: [shared storage](#shared-storage-recommended) (recommended) and [separate storage](#separate-storage-ephemeral) for ephemeral environments.

Before reading this guide, make sure you're familiar with the [Deployment guide](/docs/sync/guides/deployment) for general setup.

## Overview

Electric is designed to run as a **single active instance** per replication stream. It uses a PostgreSQL advisory lock — a cooperative lock used for application-level coordination that does not lock any tables or rows — to ensure only one instance actively replicates from Postgres at a time.

When you deploy a new version:

1. The **new instance** starts and loads shape metadata from storage
2. While the old instance holds the lock, the new instance enters **read-only mode** — it can serve requests for existing shapes but cannot create new ones
3. Once the old instance shuts down, its database connection drops and the lock is released
4. The **new instance** acquires the lock and becomes fully active

```
Time ────────────────────────────────────────────►

Old    [==== active (200) ====]--shutdown--X
                                lock released─┐
New       [starting][waiting (202)]───────────┴─[== active ==]
              │           │                         │
          loading    serves existing          fully operational
          metadata   shapes (read-only)
```

The read-only window is typically brief — a few seconds to under a minute, depending on how quickly your orchestrator terminates the old instance. During this window, existing shapes continue to be served. Requests for new shapes return `503` with a `Retry-After` header until the new instance becomes active. The official [TypeScript client](/docs/sync/api/clients/typescript) handles both of these automatically.

> \[!Tip] Version compatibility
> Shape handle stability across deploys depends on Electric's internal shape identity computation not changing between versions. If a new version changes how shapes are identified or changes the storage schema, even shared-storage upgrades may trigger `409` (must-refetch) responses. Check the release notes for any such breaking changes before upgrading.

### Choosing a strategy

| | Shared storage | Separate storage |
|---|---|---|
| Client disruption | Minimal (new shapes briefly delayed) | 409s (clients must refetch shapes) |
| Sticky sessions required | No | Yes |
| Postgres overhead | Single slot | One slot per instance |
| Best for | [Most deployments](#shared-storage-recommended) | [Ephemeral environments](#separate-storage-ephemeral) |

## How the advisory lock works

The advisory lock is tied to the replication slot name:

```sql
SELECT pg_advisory_lock(hashtext('electric_slot_{stream_id}'))
```

This lock is scoped to Electric's replication slot name and does not conflict with any other advisory locks or table-level locks in your database.

* Only one instance can hold the lock per [`ELECTRIC_REPLICATION_STREAM_ID`](/docs/sync/api/config#electric-replication-stream-id)
* The lock is held on the replication database connection — if the connection drops (e.g., instance shutdown), the lock is automatically released

> \[!Tip] Lock breaker
> Electric includes a lock breaker mechanism that checks every 10 seconds whether the replication slot associated with the lock is inactive in Postgres. If the slot is inactive but a backend still holds the advisory lock, Electric terminates that backend. This only affects connections where the replication stream has already stopped, so it will not interfere with a healthy instance during a normal rolling deploy.

## Health check behavior during upgrades

The [`/v1/health`](/docs/sync/guides/deployment#health-checks) endpoint reflects the instance's current state:

| HTTP Status | Response | Meaning |
|-------------|----------|---------|
| `200` | `{"status": "active"}` | The instance is active — it holds the advisory lock and is fully operational |
| `202` | `{"status": "waiting"}` | The instance is ready — it can serve existing shapes in read-only mode but is not yet active |
| `202` | `{"status": "starting"}` | The instance is starting up and not yet ready to serve any requests |

During the `waiting` state:

* Requests for **existing shapes** are served normally (read-only mode)
* Requests that require **creating new shapes** return `503` with a `Retry-After: 5` header
* **Shape deletion** also requires active mode and returns `503` while waiting

For orchestrator probe configuration, see the [health check section](#health-checks-must-accept-http-202) below.

## Shared storage (recommended)

When instances share the same filesystem (e.g., a persistent volume), they share shape data and metadata. This is the recommended approach because shape handles remain stable across deploys — clients don't need sticky sessions and experience minimal disruption.

### When to use

* Kubernetes with [ReadWriteMany](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes) PersistentVolumeClaims
* AWS ECS on EC2 with shared host volumes (use [placement constraints](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html) to keep tasks on the same host)
* Any platform where both instances can access the same filesystem

> \[!Warning] Network filesystems and performance
> Electric is IO-intensive — it reads and writes shape logs and metadata frequently. Network filesystems like [EFS](https://aws.amazon.com/efs/) or NFS add significant latency compared to local storage and may not perform well for large deployments. Prefer local volumes (e.g., NVMe SSDs on EC2 with host bind mounts) where possible. If you must use a network filesystem, see the [troubleshooting guide](/docs/sync/guides/troubleshooting#sqlite-corruption-mdash-why-is-my-shape-metadata-database-corrupt-on-nfs-efs) for important SQLite configuration.

### Configuration

Both instances use identical configuration. The key requirement is that `ELECTRIC_STORAGE_DIR` points to a shared filesystem:

```shell
DATABASE_URL=postgresql://user:password@host:5432/mydb
ELECTRIC_STORAGE_DIR=/shared/electric/data
ELECTRIC_SECRET=your-secret
```

> \[!Warning] `ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE` for shared storage
> When using a **network filesystem** (NFS, EFS) for shared storage, you **must** set [`ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE=true`](/docs/sync/api/config#electric-shape-db-exclusive-mode). This configures SQLite to use a single read-write connection, preventing corruption from concurrent access — SQLite's default WAL mode relies on shared-memory locking that does not work correctly on network filesystems. For **local shared volumes** (e.g., a K8s PVC backed by local SSD), this setting is not strictly required but is recommended as a safe default. It is included in all shared-storage examples below.

### Docker Compose example

This example demonstrates the shared-storage setup. In practice, your orchestrator handles starting and stopping instances during an upgrade.

```yaml
services:
  electric:
    image: electricsql/electric:0.9  # pin to a specific version
    environment:
      DATABASE_URL: ${DATABASE_URL}
      ELECTRIC_STORAGE_DIR: /var/lib/electric/data
      ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE: "true"
      ELECTRIC_SECRET: ${ELECTRIC_SECRET}
    volumes:
      - electric_data:/var/lib/electric/data
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:3000/v1/health"]
      interval: 10s
      timeout: 2s
      retries: 3
    # ...ports, networks, etc.

volumes:
  electric_data:
```

> \[!Tip] Simulating a rolling deploy
> To test the lock handover locally, start a second container pointing at the same volume, then stop the first. Note that `docker compose --scale` requires removing static port mappings or using a port range to avoid conflicts.

### Kubernetes example

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: electric
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    # ...labels, selectors
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: electric
          image: electricsql/electric:0.9  # pin to a specific version
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: electric-secrets
                  key: database-url
            - name: ELECTRIC_STORAGE_DIR
              value: "/var/lib/electric/data"
            - name: ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE
              value: "true"
            - name: ELECTRIC_SECRET
              valueFrom:
                secretKeyRef:
                  name: electric-secrets
                  key: electric-secret
          volumeMounts:
            - name: electric-storage
              mountPath: /var/lib/electric/data
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            # ...limits
          livenessProbe:
            httpGet:
              path: /v1/health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 6
          readinessProbe:
            httpGet:
              path: /v1/health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
      volumes:
        - name: electric-storage
          persistentVolumeClaim:
            claimName: electric-shared-pvc
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: electric-shared-pvc
spec:
  accessModes:
    - ReadWriteMany
  # storageClassName: efs-sc  # use a storage class that supports RWX
  resources:
    requests:
      storage: 10Gi
```

With `maxSurge: 1` and `maxUnavailable: 0`, Kubernetes will:

1. Start a new pod alongside the existing one
2. The new pod enters read-only mode (`202` "waiting") and passes the readiness probe (any 2xx)
3. Kubernetes terminates the old pod
4. The old pod shuts down, releasing the advisory lock
5. The new pod acquires the lock and becomes fully active (`200`)

### AWS ECS example

This example uses EC2 launch type with a host bind mount for shared storage. Both old and new tasks share the same directory on the EC2 host.

> \[!Warning] Same-host placement
> ECS does not guarantee that the new task lands on the same host as the old one. To ensure both tasks share the same host volume, your ECS cluster must have exactly one EC2 instance matching your placement constraint, or use a custom instance attribute to pin tasks to a specific host.

```json
{
  "family": "electric",
  "networkMode": "awsvpc",
  "cpu": "1024",
  "memory": "2048",
  "executionRoleArn": "arn:aws:iam::...:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "electric",
      "image": "electricsql/electric:0.9",
      "portMappings": [
        { "containerPort": 3000, "protocol": "tcp" }
      ],
      "environment": [
        { "name": "ELECTRIC_STORAGE_DIR", "value": "/var/lib/electric/data" },
        { "name": "ELECTRIC_SHAPE_DB_EXCLUSIVE_MODE", "value": "true" }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:..."
        },
        {
          "name": "ELECTRIC_SECRET",
          "valueFrom": "arn:aws:secretsmanager:..."
        }
      ],
      "mountPoints": [
        { "sourceVolume": "electric-data", "containerPath": "/var/lib/electric/data" }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -sf http://localhost:3000/v1/health || exit 1"],
        "interval": 10,
        "timeout": 2,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ],
  "volumes": [
    {
      "name": "electric-data",
      "host": { "sourcePath": "/var/lib/electric/data" }
    }
  ]
}
```

Configure your ECS service for rolling upgrades:

```json
{
  "deploymentConfiguration": {
    "minimumHealthyPercent": 100,
    "maximumPercent": 200
  }
}
```

This ensures ECS starts the new task before stopping the old one, allowing the advisory lock handover to occur. Set the health check grace period on your ECS service to 60–90 seconds to allow time for the new task to acquire the advisory lock.

### Health checks must accept HTTP 202

Your orchestrator's health or readiness check must accept `202` responses during upgrades. If it only considers `200` as healthy, the new instance can never become ready while the old instance holds the lock — creating a deadlock where the orchestrator waits for the new instance before terminating the old one.

Both Kubernetes `httpGet` probes and ECS health checks using `curl -sf` accept any 2xx by default, which is the correct behavior for rolling upgrades.

> \[!Warning] Single-instance readiness probes
> The [Deployment guide](/docs/sync/guides/deployment#kubernetes-probes) recommends an `exec` readiness probe that checks for exactly HTTP `200`. That approach is correct for single-instance deployments where you don't want a starting instance to receive traffic, but it will deadlock during rolling upgrades. If you are performing rolling upgrades, use `httpGet` readiness probes as shown in the examples above.

## Separate storage (ephemeral)

When shared storage is not available (e.g., ECS with ephemeral block storage, containers with local-only disks), each instance must have its own replication slot and maintains its own shape data independently. This means each instance has **different shape handles** for the same shape definitions, so clients **must** use sticky sessions and will receive `409` (must-refetch) responses when they switch between instances during a deploy.

The platform examples from the [shared storage](#shared-storage-recommended) section above apply — just remove the shared volume mount and use the configuration shown here.

There are two ways to manage the per-instance replication slots:

### Temporary replication slots

Use temporary replication slots that are automatically cleaned up when the connection closes. This is the simplest approach for ephemeral storage and avoids accumulating orphaned slots.

```shell
CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN=true
ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME=true
ELECTRIC_STORAGE_DIR=/local/electric/data
```

The random name option avoids replication slot name conflicts when old and new instances briefly overlap during a rolling upgrade.

With this configuration:

* Electric creates a `TEMPORARY` replication slot on the database connection
* The slot is automatically dropped by Postgres when the connection closes (on clean shutdown or crash)
* The new instance creates a fresh temporary slot and starts replicating

> \[!Warning] Network partitions cause shape rotations
> If Electric crashes or loses its database connection unexpectedly, the temporary slot is eventually cleaned up by Postgres once it detects the dead connection (which depends on TCP keepalive settings and may take minutes). When the new instance starts with a fresh slot, all existing shapes are invalidated and clients receive `409` (must-refetch) responses requiring a full resync. See [Replication slot recreation](/docs/sync/guides/troubleshooting#replication-slot-recreation-mdash-why-are-all-clients-resyncing-after-a-crash) in the troubleshooting guide for more details.

See the config reference for [`CLEANUP_REPLICATION_SLOTS_ON_SHUTDOWN`](/docs/sync/api/config#cleanup-replication-slots-on-shutdown) and [`ELECTRIC_TEMPORARY_REPLICATION_SLOT_USE_RANDOM_NAME`](/docs/sync/api/config#electric-temporary-replication-slot-use-random-name).

### Separate replication stream IDs

Alternatively, give each concurrent instance its own [`ELECTRIC_REPLICATION_STREAM_ID`](/docs/sync/api/config#electric-replication-stream-id). This creates named replication slots that persist, giving you more explicit control. This is different from [sharding](/docs/sync/guides/sharding), where separate stream IDs are used for instances connecting to different databases — here, both instances connect to the same database.

```shell
# Instance A (e.g., blue deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-blue
ELECTRIC_STORAGE_DIR=/local/electric/data

# Instance B (e.g., green deployment)
ELECTRIC_REPLICATION_STREAM_ID=deploy-green
ELECTRIC_STORAGE_DIR=/local/electric/data
```

> \[!Warning] Postgres resource overhead
> Each replication stream ID creates its own replication slot and publication. Multiple replication slots increase WAL retention on Postgres since each slot independently prevents WAL from being cleaned up.
>
> Monitor your replication slots as described in the [Troubleshooting guide](/docs/sync/guides/troubleshooting#wal-growth-mdash-why-is-my-postgres-database-storage-filling-up). Clean up unused slots promptly when old instances are fully decommissioned.

When the old deployment is fully stopped, clean up its replication slot and publication in Postgres. The names follow the pattern `electric_slot_{stream_id}` and `electric_publication_{stream_id}`:

```sql
SELECT pg_drop_replication_slot('electric_slot_deploy_blue');
DROP PUBLICATION IF EXISTS electric_publication_deploy_blue;
```

## Client behavior during deploys

The official [TypeScript client](/docs/sync/api/clients/typescript) handles deploy transitions automatically:

* **`503` with `Retry-After` header**: The client backs off and retries. This happens when requesting new shapes during the read-only window.
* **`409` (must-refetch)**: The client refetches the shape from scratch. This happens with separate-storage strategies or when shapes are rotated.
* **Long-poll connections**: Existing long-poll connections on active shapes continue working normally during the read-only window.

If you're using a custom client, ensure it handles these response codes. See the [HTTP API docs](/docs/sync/api/http) for details on the protocol.

## Next steps

* [Deployment guide](/docs/sync/guides/deployment) for general deployment setup
* [Sharding guide](/docs/sync/guides/sharding) for multi-database deployment patterns
* [Config reference](/docs/sync/api/config) for all configuration options
* [Troubleshooting guide](/docs/sync/guides/troubleshooting#rolling-upgrades-mdash-why-is-my-second-instance-stuck-in-waiting-state) for common upgrade issues
