Going to Production

Last edited 4 minutes ago.

On this page

1. Choose the Right Backend

Picking the right storage backend is the most important production decision. Apalis supports several:

BackendCrateBest For
PostgreSQLapalis-postgresDurable jobs, existing Postgres infra
MySQL/MariaDBapalis-mysqlDurable jobs, existing MySQL infra
SQLiteapalis-sqliteLow-traffic or single-node deployments
Redisapalis-redisHigh-throughput, low-latency job queues
AMQPapalis-amqpMessage broker-based architectures
PGMQapalis-pgmqPostgres-native message queues with at-least-once delivery
NATSapalis-natsDistributed messaging, cloud-native and edge deployments
RSMQapalis-rsmqRedis-backed simple message queues
Cronapalis-cronSchedule-driven jobs — use with pipe_to for persistence

For most production systems, PostgreSQL or Redis are the recommended choices. SQLite should be avoided for multi-node or high-concurrency deployments because it lacks the concurrent write performance needed.

PostgreSQL is preferred when job durability, transactional guarantees, and the ability to query job state directly from your main database are priorities. Redis is preferred when raw throughput and minimal latency matter most.

Setting Up PostgreSQL Storage

# Cargo.toml
[dependencies]
apalis = { version = "1.0.0-rc.4" }
apalis-postgres = { version = "1.0.0-rc.4" }
use apalis_sql::postgres::PostgresStorage;
use sqlx::PgPool;

let pool = PgPool::connect(&database_url).await?;
// Run migrations to create the jobs table
PostgresStorage::setup(&pool).await?;
let storage = PostgresStorage::new(pool);

2. Build for Production

Always build in release mode for production. Debug builds are significantly slower due to the lack of optimizations.

cargo build --release

For smaller binary sizes, add the following to your Cargo.toml:

[profile.release]
opt-level = 3
lto = true           # Link-time optimization
codegen-units = 1    # Better optimization at cost of compile time
strip = true         # Strip debug symbols from the binary

These settings can reduce binary size by 30–60% and meaningfully improve runtime performance.


3. Configuration & Environment

Never hardcode credentials or connection strings. Use environment variables for all runtime configuration.

use std::env;

let database_url = env::var("DATABASE_URL")
    .expect("DATABASE_URL must be set");

let redis_url = env::var("REDIS_URL")
    .expect("REDIS_URL must be set");

A minimal .env file for reference (do not commit this to version control):

DATABASE_URL=postgres://user:password@host:5432/mydb
REDIS_URL=redis://:password@host:6379
RUST_LOG=info
WORKER_CONCURRENCY=10

Use a crate like dotenvy to load .env files in non-containerized environments:

dotenvy = "0.15"
dotenvy::dotenv().ok(); // Load .env if present, silently skip if not

4. Concurrency & Worker Tuning

Concurrency controls how many jobs a single worker processes simultaneously. Setting it too low wastes resources; too high can overwhelm your database, downstream APIs, or hit memory limits.

WorkerBuilder::new("email-worker")
    .parallelize(tokio::spawn) // Process jobs in parallel with tokio::spawn
    .concurrency(10) // Process up to 10 jobs concurrently
    .backend(storage)
    .build_fn(send_email)

Recommended starting points:

  • CPU-bound jobs (image processing, encoding): set concurrency to the number of CPU cores (num_cpus::get())
  • I/O-bound jobs (HTTP calls, DB writes, emails): set concurrency to 10–50 or more, depending on downstream capacity
  • Rate-limited jobs (third-party APIs): use the RateLimitLayer (see section 8) rather than just limiting concurrency

You can make concurrency configurable via an environment variable:

let concurrency: usize = env::var("WORKER_CONCURRENCY")
    .unwrap_or_else(|_| "10".to_string())
    .parse()
    .expect("WORKER_CONCURRENCY must be a number");

5. Graceful Shutdown

Apalis's Monitor supports graceful shutdown out of the box. It waits for in-progress jobs to complete before exiting, preventing data loss or incomplete operations on SIGTERM.

use apalis::prelude::*;
use tokio::signal;

Monitor::new()
    .register(|run_id| {
        WorkerBuilder::new("email-worker")
            .backend(storage)
            .parallelize(tokio::spawn)
            .concurrency(10)
            .build_fn(send_email)
    })
    .on_event(|e| tracing::info!("{e}"))
    .shutdown_timeout(std::time::Duration::from_secs(30)) // Wait up to 30s for jobs to finish
    .run_with_signal(signal::ctrl_c()) // Gracefully stop on Ctrl+C / SIGINT
    .await?;

In containerized environments, also handle SIGTERM (what Kubernetes sends on pod termination):

use tokio::signal::unix::{signal, SignalKind};

async fn shutdown_signal() {
    let mut sigterm = signal(SignalKind::terminate()).unwrap();
    let mut sigint = signal(SignalKind::interrupt()).unwrap();
    tokio::select! {
        _ = sigterm.recv() => tracing::info!("Received SIGTERM"),
        _ = sigint.recv() => tracing::info!("Received SIGINT"),
    }
}

Monitor::new()
    // ...
    .run_with_signal(shutdown_signal())
    .await?;

Set your container's terminationGracePeriodSeconds in Kubernetes to be longer than your shutdown_timeout to allow jobs to finish cleanly.


6. Error Handling & Retries

Production jobs will fail. Design for it explicitly.

Return Errors from Job Handlers

use apalis::prelude::*;

async fn send_email(job: Email, _: Data<()>) -> Result<(), BoxDynError> {
    smtp_client.send(&job.to, &job.body).await?;
    Ok(())
}

Returning Err(...) marks the job as failed and triggers retry logic if configured.

Add a Retry Layer

use apalis::layers::retry::{RetryLayer, RetryPolicy};
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .layer(RetryLayer::new(RetryPolicy::retries(3))) // Retry up to 3 times
    .backend(storage)
    .build_fn(send_email)

Implement Custom Retry Logic

For more control (e.g., exponential backoff, only retry on specific errors):

use apalis::layers::retry::RetryPolicy;
use tower::retry::Policy;

#[derive(Clone)]
struct ExponentialBackoff { attempts: usize }

impl<Req: Clone, Res, E> Policy<Req, Res, E> for ExponentialBackoff {
    type Future = std::future::Ready<Self>;
    fn retry(&self, _req: &Req, result: Result<&Res, &E>) -> Option<Self::Future> {
        if result.is_err() && self.attempts < 5 {
            Some(std::future::ready(ExponentialBackoff { attempts: self.attempts + 1 }))
        } else {
            None
        }
    }
    fn clone_request(&self, req: &Req) -> Option<Req> { 
        Some(req.clone()) 
    }
}

Catch Panics

use apalis::layers::retry::{RetryLayer, RetryPolicy};
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .backend(storage)
    .catch_panic()
    .build_fn(send_email)

Dead Letter Queues

Consider using a separate storage namespace or queue to move permanently failed jobs to for later inspection, rather than discarding them:

// With Redis, use a dedicated namespace for DLQ
let dlq_config = apalis_redis::Config::default()
    .set_namespace("my-app::dead-letter");
let dlq_storage = RedisStorage::new_with_config(conn.clone(), dlq_config);

7. Observability: Logging, Tracing & Metrics

Structured Logging with tracing

Apalis integrates natively with the tracing ecosystem. Enable tracing in your builder and configure a subscriber at startup:

use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};

tracing_subscriber::registry()
    .with(tracing_subscriber::EnvFilter::new(
        std::env::var("RUST_LOG").unwrap_or_else(|_| "info".to_string()),
    ))
    .with(tracing_subscriber::fmt::layer().json()) // JSON logs for production
    .init();

Then enable tracing in your worker:

WorkerBuilder::new("email-worker")
    .enable_tracing() // Automatically traces each job execution
    .backend(storage)
    .build_fn(send_email)

Prometheus Metrics

With the prometheus feature, apalis can expose job metrics (job counts, durations, failures):

apalis = { version = "1.0.0-rc.4", features = ["prometheus"] }
use apalis::layers::prometheus::PrometheusLayer;
use tower::ServiceBuilder;

WorkerBuilder::new("email-worker")
    .layer(PrometheusLayer::new())
    .backend(storage)
    .build_fn(send_email)

Expose a /metrics endpoint using your HTTP server (e.g., Axum or Actix-web) that serves the Prometheus registry output.

Sentry Integration

apalis = { version = "1.0.0-rc.4", features = ["sentry"] }
use sentry_tower::NewSentryLayer;
use apalis::{layers::sentry::SentryLayer, prelude::*};

WorkerBuilder::new("email-worker")
    .layer(NewSentryLayer::new_from_top())
    .layer(SentryLayer::new())
    .backend(storage)
    .build_fn(send_email)

8. Rate Limiting & Backpressure

Protect downstream services and comply with third-party API rate limits using the RateLimitLayer:

apalis = { version = "1.0.0-rc.4", features = ["limit"] }
tower = { version = "0.4", features = ["limit"] }
use std::time::Duration;

WorkerBuilder::new("sendgrid-worker")
    .rate_limit(100, Duration::from_secs(1))
    .backend(storage)
    .build_fn(send_email)

You can also apply a ConcurrencyLimitLayer to cap total concurrent executions:

WorkerBuilder::new("sendgrid-worker")
    .concurrency(10)
    .backend(storage)
    .build_fn(send_email)

9. Monitoring with apalis Board

Apalis Board is an optional web UI for monitoring and managing jobs. It provides a real-time view of queued, running, failed, and completed jobs.

apalis-board = { version = "1.0.0-rc.4" }
apalis-board-api = { version = "1.0.0-rc.4" }

Integrate it with an Axum-based HTTP server:

use apalis_board::Board;
use axum::Router;

let api = ApiBuilder::new(Router::new())
    .register(email_store.clone())
    .build();
let router = Router::new()
    .nest("/api/v1", api)
    .fallback_service(ServeUI::new())
    .layer(Extension(broadcaster.clone()));

let listener = tokio::net::TcpListener::bind(&args.api_host).await.unwrap();

axum::serve(listener, router)
    .with_graceful_shutdown(ctrl_c().map(|_| ()))
    .await

Security note: In production, protect the apalis Board behind authentication middleware. It exposes job data and allows manual job management, so it should never be publicly accessible without authorization.


10. Scaling Workers

Horizontal Scaling

Because apalis backends (Postgres, Redis, AMQP) are distributed by design, you can run multiple worker processes or pods without any special coordination. Each worker independently polls for and claims jobs using atomic operations, so there is no double-processing.

# Run multiple worker instances pointing at the same backend
./my-worker &
./my-worker &
./my-worker &

Multiple Workers in One Process

You can also register multiple workers with a single Monitor to process different job types in one process:

Monitor::new()
    .register(
        WorkerBuilder::new("email-worker")
            .concurrency(20)
            .backend(email_storage)
            .build_fn(send_email)
    )
    .register(
        WorkerBuilder::new("report-worker")
            .concurrency(5)
            .backend(report_storage)
            .build_fn(generate_report)
    )
    .run_with_signal(shutdown_signal())
    .await?;

Worker Naming

Give each worker a unique, descriptive name. This name is used in monitoring and logs, so a name like "email-worker-us-east" is more useful than "worker-1".


11. Deployment Patterns

Docker

A minimal production Dockerfile:

FROM rust:1.80 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
 
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates libssl3 && rm -rf /var/lib/apt/lists/*
COPY --from=builder /app/target/release/my-worker /usr/local/bin/my-worker
CMD ["my-worker"]

Kubernetes

A basic deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: email-worker
spec:
  replicas: 3
  selector:
    matchLabels:
      app: email-worker
  template:
    metadata:
      labels:
        app: email-worker
    spec:
      terminationGracePeriodSeconds: 60  # Must exceed shutdown_timeout
      containers:
        - name: email-worker
          image: my-registry/my-worker:latest
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url
            - name: RUST_LOG
              value: "info"
            - name: WORKER_CONCURRENCY
              value: "10"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"

Systemd (bare metal / VMs)

[Unit]
Description=My Apalis Worker
After=network.target postgresql.service
 
[Service]
Type=simple
User=myapp
EnvironmentFile=/etc/myapp/worker.env
ExecStart=/usr/local/bin/my-worker
Restart=always
RestartSec=5
KillMode=process      # Send SIGTERM to main process only
TimeoutStopSec=60     # Allow up to 60s for graceful shutdown
 
[Install]
WantedBy=multi-user.target

12. Security Checklist

  • Never commit secrets — use environment variables, Kubernetes Secrets, or a secrets manager (Vault, AWS Secrets Manager, etc.)
  • Use TLS for all backend connections — ensure DATABASE_URL and REDIS_URL use TLS (sslmode=require for Postgres, rediss:// for Redis)
  • Restrict backend access — workers should connect to the database/Redis from a private network, not a public endpoint
  • Protect apalis Board — put it behind authentication (e.g., HTTP Basic Auth, OAuth, or an internal-only network)
  • Validate job payloads — treat incoming job data as untrusted input; use serde validation and reject malformed payloads
  • Set resource limits — apply memory and CPU limits in Docker/Kubernetes to prevent a runaway job from taking down the host

13. Production Checklist

Before going live, verify all of the following:

  • Release build compiled with --release
  • Production backend chosen (Postgres or Redis recommended)
  • Database migrations run (PostgresStorage::setup(&pool).await?)
  • All configuration via environment variables (no hardcoded secrets)
  • Graceful shutdown configured with appropriate timeout
  • Error handling: all job handlers return Result
  • Retry policy configured for transient failures
  • Dead letter queue or failed job visibility strategy defined
  • tracing / structured logging initialized with JSON format
  • Metrics exposed (Prometheus) and dashboards created
  • Rate limiting applied for jobs calling external APIs
  • Worker concurrency tuned for your workload type
  • apalis Board deployed (if used) and protected behind auth
  • Container image built with a minimal base image
  • terminationGracePeriodSeconds exceeds shutdown_timeout in Kubernetes
  • Load tested: backend can handle expected job throughput
  • Alerting set up on job failure rate and queue depth