Going to Production

#1. Choose the Right Backend

Picking the right storage backend is the most important production decision. Apalis supports several:

Backend	Crate	Best For
PostgreSQL	`apalis-postgres`	Durable jobs, existing Postgres infra
MySQL/MariaDB	`apalis-mysql`	Durable jobs, existing MySQL infra
SQLite	`apalis-sqlite`	Low-traffic or single-node deployments
Redis	`apalis-redis`	High-throughput, low-latency job queues
AMQP	`apalis-amqp`	Message broker-based architectures
PGMQ	`apalis-pgmq`	Postgres-native message queues with at-least-once delivery
NATS	`apalis-nats`	Distributed messaging, cloud-native and edge deployments
RSMQ	`apalis-rsmq`	Redis-backed simple message queues
Cron	`apalis-cron`	Schedule-driven jobs — use with `pipe_to` for persistence

For most production systems, PostgreSQL or Redis are the recommended choices. SQLite should be avoided for multi-node or high-concurrency deployments because it lacks the concurrent write performance needed.

PostgreSQL is preferred when job durability, transactional guarantees, and the ability to query job state directly from your main database are priorities. Redis is preferred when raw throughput and minimal latency matter most.

#Setting Up PostgreSQL Storage

# Cargo.toml
[dependencies]
apalis = { version = "1.0.0-rc.4" }
apalis-postgres = { version = "1.0.0-rc.4" }

use apalis_sql::postgres::PostgresStorage;
use sqlx::PgPool;

let pool = PgPool::connect(&database_url).await?;
// Run migrations to create the jobs table
PostgresStorage::setup(&pool).await?;
let storage = PostgresStorage::new(pool);

#2. Build for Production

Always build in release mode for production. Debug builds are significantly slower due to the lack of optimizations.

cargo build --release

For smaller binary sizes, add the following to your Cargo.toml:

[profile.release]
opt-level = 3
lto = true           # Link-time optimization
codegen-units = 1    # Better optimization at cost of compile time
strip = true         # Strip debug symbols from the binary

These settings can reduce binary size by 30–60% and meaningfully improve runtime performance.

#3. Configuration & Environment

Never hardcode credentials or connection strings. Use environment variables for all runtime configuration.

use std::env;

let database_url = env::var("DATABASE_URL")
    .expect("DATABASE_URL must be set");

let redis_url = env::var("REDIS_URL")
    .expect("REDIS_URL must be set");

A minimal .env file for reference (do not commit this to version control):

DATABASE_URL=postgres://user:password@host:5432/mydb
REDIS_URL=redis://:password@host:6379
RUST_LOG=info
WORKER_CONCURRENCY=10

Use a crate like dotenvy to load .env files in non-containerized environments:

dotenvy = "0.15"

dotenvy::dotenv().ok(); // Load .env if present, silently skip if not

#4. Concurrency & Worker Tuning

Concurrency controls how many jobs a single worker processes simultaneously. Setting it too low wastes resources; too high can overwhelm your database, downstream APIs, or hit memory limits.

WorkerBuilder::new("email-worker")
    .parallelize(tokio::spawn) // Process jobs in parallel with tokio::spawn
    .concurrency(10) // Process up to 10 jobs concurrently
    .backend(storage)
    .build_fn(send_email)

Recommended starting points:

CPU-bound jobs (image processing, encoding): set concurrency to the number of CPU cores (num_cpus::get())
I/O-bound jobs (HTTP calls, DB writes, emails): set concurrency to 10–50 or more, depending on downstream capacity
Rate-limited jobs (third-party APIs): use the RateLimitLayer (see section 8) rather than just limiting concurrency

You can make concurrency configurable via an environment variable:

let concurrency: usize = env::var("WORKER_CONCURRENCY")
    .unwrap_or_else(|_| "10".to_string())
    .parse()
    .expect("WORKER_CONCURRENCY must be a number");

#5. Graceful Shutdown

Apalis's Monitor supports graceful shutdown out of the box. It waits for in-progress jobs to complete before exiting, preventing data loss or incomplete operations on SIGTERM.

use apalis::prelude::*;
use tokio::signal;

Monitor::new()
    .register(|run_id| {
        WorkerBuilder::new("email-worker")
            .backend(storage)
            .parallelize(tokio::spawn)
            .concurrency(10)
            .build_fn(send_email)
    })
    .on_event(|e| tracing::info!("{e}"))
    .shutdown_timeout(std::time::Duration::from_secs(30)) // Wait up to 30s for jobs to finish
    .run_with_signal(signal::ctrl_c()) // Gracefully stop on Ctrl+C / SIGINT
    .await?;

In containerized environments, also handle SIGTERM (what Kubernetes sends on pod termination):

use tokio::signal::unix::{signal, SignalKind};

async fn shutdown_signal() {
    let mut sigterm = signal(SignalKind::terminate()).unwrap();
    let mut sigint = signal(SignalKind::interrupt()).unwrap();
    tokio::select! {
        _ = sigterm.recv() => tracing::info!("Received SIGTERM"),
        _ = sigint.recv() => tracing::info!("Received SIGINT"),
    }
}

Monitor::new()
    // ...
    .run_with_signal(shutdown_signal())
    .await?;

Set your container's terminationGracePeriodSeconds in Kubernetes to be longer than your shutdown_timeout to allow jobs to finish cleanly.

#6. Error Handling & Retries

Production jobs will fail. Design for it explicitly.

#Return Errors from Job Handlers

use apalis::prelude::*;

async fn send_email(job: Email, _: Data<()>) -> Result<(), BoxDynError> {
    smtp_client.send(&job.to, &job.body).await?;
    Ok(())
}

Returning Err(...) marks the job as failed and triggers retry logic if configured.

#Add a Retry Layer

use apalis::layers::retry::{RetryLayer, RetryPolicy};
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .layer(RetryLayer::new(RetryPolicy::retries(3))) // Retry up to 3 times
    .backend(storage)
    .build_fn(send_email)

#Implement Custom Retry Logic

For more control (e.g., exponential backoff, only retry on specific errors):

use apalis::layers::retry::RetryPolicy;
use tower::retry::Policy;

#[derive(Clone)]
struct ExponentialBackoff { attempts: usize }

impl<Req: Clone, Res, E> Policy<Req, Res, E> for ExponentialBackoff {
    type Future = std::future::Ready<Self>;
    fn retry(&self, _req: &Req, result: Result<&Res, &E>) -> Option<Self::Future> {
        if result.is_err() && self.attempts < 5 {
            Some(std::future::ready(ExponentialBackoff { attempts: self.attempts + 1 }))
        } else {
            None
        }
    }
    fn clone_request(&self, req: &Req) -> Option<Req> { 
        Some(req.clone()) 
    }
}

#Catch Panics

use apalis::layers::retry::{RetryLayer, RetryPolicy};
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .backend(storage)
    .catch_panic()
    .build_fn(send_email)

#Dead Letter Queues

Consider using a separate storage namespace or queue to move permanently failed jobs to for later inspection, rather than discarding them:

// With Redis, use a dedicated namespace for DLQ
let dlq_config = apalis_redis::Config::default()
    .set_namespace("my-app::dead-letter");
let dlq_storage = RedisStorage::new_with_config(conn.clone(), dlq_config);

#7. Observability: Logging, Tracing & Metrics

#Structured Logging with tracing

Apalis integrates natively with the tracing ecosystem. Enable tracing in your builder and configure a subscriber at startup:

use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

tracing_subscriber::registry()
    .with(tracing_subscriber::EnvFilter::new(
        std::env::var("RUST_LOG").unwrap_or_else(|_| "info".to_string()),
    ))
    .with(tracing_subscriber::fmt::layer().json()) // JSON logs for production
    .init();

Then enable tracing in your worker:

WorkerBuilder::new("email-worker")
    .enable_tracing() // Automatically traces each job execution
    .backend(storage)
    .build_fn(send_email)

#Prometheus Metrics

With the prometheus feature, apalis can expose job metrics (job counts, durations, failures):

apalis = { version = "1.0.0-rc.4", features = ["prometheus"] }

use apalis::layers::prometheus::PrometheusLayer;
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .layer(PrometheusLayer::new())
    .backend(storage)
    .build_fn(send_email)

Expose a /metrics endpoint using your HTTP server (e.g., Axum or Actix-web) that serves the Prometheus registry output.

#Sentry Integration

apalis = { version = "1.0.0-rc.4", features = ["sentry"] }

use sentry_tower::NewSentryLayer;
use apalis::{layers::sentry::SentryLayer, prelude::*};

WorkerBuilder::new("email-worker")
    .layer(NewSentryLayer::new_from_top())
    .layer(SentryLayer::new())
    .backend(storage)
    .build_fn(send_email)

#8. Rate Limiting & Backpressure

Protect downstream services and comply with third-party API rate limits using the RateLimitLayer:

apalis = { version = "1.0.0-rc.4", features = ["limit"] }
tower = { version = "0.4", features = ["limit"] }

use std::time::Duration;

WorkerBuilder::new("sendgrid-worker")
    .rate_limit(100, Duration::from_secs(1))
    .backend(storage)
    .build_fn(send_email)

You can also apply a ConcurrencyLimitLayer to cap total concurrent executions:

WorkerBuilder::new("sendgrid-worker")
    .concurrency(10)
    .backend(storage)
    .build_fn(send_email)

#9. Monitoring with apalis Board

Apalis Board is an optional web UI for monitoring and managing jobs. It provides a real-time view of queued, running, failed, and completed jobs.

apalis-board = { version = "1.0.0-rc.4" }
apalis-board-api = { version = "1.0.0-rc.4" }

Integrate it with an Axum-based HTTP server:

use apalis_board::Board;
use axum::Router;

let api = ApiBuilder::new(Router::new())
    .register(email_store.clone())
    .build();
let router = Router::new()
    .nest("/api/v1", api)
    .fallback_service(ServeUI::new())
    .layer(Extension(broadcaster.clone()));

let listener = tokio::net::TcpListener::bind(&args.api_host).await.unwrap();

axum::serve(listener, router)
    .with_graceful_shutdown(ctrl_c().map(|_| ()))
    .await

Security note: In production, protect the apalis Board behind authentication middleware. It exposes job data and allows manual job management, so it should never be publicly accessible without authorization.

#10. Scaling Workers

#Horizontal Scaling

Because apalis backends (Postgres, Redis, AMQP) are distributed by design, you can run multiple worker processes or pods without any special coordination. Each worker independently polls for and claims jobs using atomic operations, so there is no double-processing.

# Run multiple worker instances pointing at the same backend
./my-worker &
./my-worker &
./my-worker &

#Multiple Workers in One Process

You can also register multiple workers with a single Monitor to process different job types in one process:

Monitor::new()
    .register(
        WorkerBuilder::new("email-worker")
            .concurrency(20)
            .backend(email_storage)
            .build_fn(send_email)
    )
    .register(
        WorkerBuilder::new("report-worker")
            .concurrency(5)
            .backend(report_storage)
            .build_fn(generate_report)
    )
    .run_with_signal(shutdown_signal())
    .await?;

#Worker Naming

Give each worker a unique, descriptive name. This name is used in monitoring and logs, so a name like "email-worker-us-east" is more useful than "worker-1".

#11. Deployment Patterns

#Docker

A minimal production Dockerfile:

FROM rust:1.80 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
 
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/my-worker /usr/local/bin/my-worker
CMD ["my-worker"]

#Kubernetes

A basic deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: email-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: email-worker
  template:
    metadata:
      labels:
        app: email-worker
    spec:
      terminationGracePeriodSeconds: 60  # Must exceed shutdown_timeout
      containers:
        - name: email-worker
          image: my-registry/my-worker:latest
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url
            - name: RUST_LOG
              value: "info"
            - name: WORKER_CONCURRENCY
              value: "10"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

#Systemd (bare metal / VMs)

[Unit]
Description=My Apalis Worker
After=network.target postgresql.service
 
[Service]
Type=simple
User=myapp
EnvironmentFile=/etc/myapp/worker.env
ExecStart=/usr/local/bin/my-worker
Restart=always
RestartSec=5
KillMode=process      # Send SIGTERM to main process only
TimeoutStopSec=60     # Allow up to 60s for graceful shutdown
 
[Install]
WantedBy=multi-user.target

#12. Security Checklist

Never commit secrets — use environment variables, Kubernetes Secrets, or a secrets manager (Vault, AWS Secrets Manager, etc.)
Use TLS for all backend connections — ensure DATABASE_URL and REDIS_URL use TLS (sslmode=require for Postgres, rediss:// for Redis)
Restrict backend access — workers should connect to the database/Redis from a private network, not a public endpoint
Protect apalis Board — put it behind authentication (e.g., HTTP Basic Auth, OAuth, or an internal-only network)
Validate job payloads — treat incoming job data as untrusted input; use serde validation and reject malformed payloads
Set resource limits — apply memory and CPU limits in Docker/Kubernetes to prevent a runaway job from taking down the host

#13. Production Checklist

Before going live, verify all of the following:

On this page