Keycloak Backup and Restore: A Practical Guide

Last updated: June 2026

Keycloak stores all its state (realms, users, clients, roles, credentials, and session metadata) in its database. A reliable Keycloak backup is fundamentally a database backup (using pg_dump, pg_basebackup, or managed PostgreSQL snapshots with PITR), not a realm export. Realm export is a useful tool for configuration-as-code, migration, and environment seeding, but it is not a complete backup: it omits users by default, lacks transactional consistency with live data, and cannot restore a production system on its own.

If you are running Keycloak in production and you have not confirmed a working restore procedure, you do not have a backup. You have a copy of data whose recoverability is untested. This guide closes that gap, and it ends with a failover runbook for when a backup restore is not fast enough.

Why the Database Is the Only Complete Source of Truth

Keycloak is a stateful application. Every action taken by users, admins, or applications is persisted in the database. This includes:

Realm configuration: authentication flows, client definitions, identity provider settings, password policies, token settings
Users and credentials: user records, hashed passwords, OTP secrets, WebAuthn credentials
Roles and groups: realm roles, client roles, group memberships, role mappings
Realm signing keys: asymmetric key pairs used to sign JWTs. These live in the database, not in the filesystem
Sessions: active user sessions and offline tokens (though these can be treated as ephemeral)
Events: admin events and user login events if event logging is enabled

The critical implication: if your database is lost and all you have is a realm export, you will lose every user’s credentials and potentially their OTP or WebAuthn enrollments. A realm export created with the --users REALM_FILE flag includes user records, but omits credentials such as hashed passwords by design in most export scenarios. More on this below.

The PostgreSQL instance backing Keycloak is the single source of truth. Treat it accordingly.

Backup Methods at a Glance

Method	What It Covers	Consistency	Best For
`pg_dump`	Full logical snapshot of DB	Consistent point-in-time	Scheduled backups, offsite archival
`pg_basebackup`	Physical copy of data directory	Consistent, supports WAL shipping	Streaming replication base, PITR
Managed snapshot (RDS, Cloud SQL, Supabase, etc.)	Full DB at snapshot time	Consistent	Automated daily backups on managed DBs
PITR (WAL archiving)	Any point since last base backup	Exact point-in-time	RPO near zero, recovering from logical corruption
Realm export (`kc.sh export`)	Realm config, optionally users	NOT transactionally consistent with live DB	Config-as-code, migration seeding, environment cloning

PostgreSQL Backup Methods in Depth

pg_dump: Logical Backups

pg_dump produces a logical, SQL-level dump of your Keycloak database. It is consistent: pg_dump uses a transaction snapshot so the output reflects the database at a single point in time, even if writes continue during the dump.

The PostgreSQL documentation on pg_dump is the authoritative reference for all flags and format options.

# Basic pg_dump, compressed custom format
pg_dump 
  --host=db-host 
  --port=5432 
  --username=keycloak 
  --dbname=keycloak 
  --format=custom 
  --compress=9 
  --file=/backups/keycloak-$(date +%Y%m%d-%H%M%S).dump

# Verify the dump is readable before relying on it
pg_restore --list /backups/keycloak-20260721-090000.dump | head -20

Use the --format=custom flag rather than plain SQL. The custom format is compressed, supports parallel restore with pg_restore --jobs, and is faster to restore from than a SQL file.

Encryption: Before shipping backups offsite, encrypt them. A common pattern is to pipe through gpg or use your cloud provider’s server-side encryption (SSE) when uploading to S3, GCS, or Azure Blob.

# Encrypt with GPG before uploading to object storage
pg_dump 
  --host=db-host 
  --username=keycloak 
  --dbname=keycloak 
  --format=custom 
  --compress=9 | gpg --encrypt --recipient ops-backup-key > /tmp/keycloak-backup.dump.gpg

# Upload to S3
aws s3 cp /tmp/keycloak-backup.dump.gpg 
  s3://your-backup-bucket/keycloak/$(date +%Y/%m/%d)/keycloak-backup.dump.gpg

pg_basebackup and WAL Archiving: Physical Backups with PITR

For production systems with strict RPO (Recovery Point Objective) requirements, physical backups combined with WAL (Write-Ahead Log) archiving allow you to restore to any point in time since the last base backup. This means that even if something corrupts your database at 14:47, you can restore to 14:46 without losing more than a minute of data.

Setting up WAL archiving is documented in PostgreSQL continuous archiving. The key configuration parameters in postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'aws s3 cp %p s3://your-wal-archive/wal/%f'

The base backup:

pg_basebackup 
  --host=db-host 
  --username=replication_user 
  --pgdata=/backups/base 
  --format=tar 
  --compress=9 
  --checkpoint=fast 
  --wal-method=stream

This approach is the foundation for production-grade Keycloak deployments. Our Keycloak production-ready checklist covers the broader set of configuration steps required before going live.

Managed PostgreSQL Snapshots and PITR

If you are running Keycloak on a managed database service (Amazon RDS, Google Cloud SQL, Azure Database for PostgreSQL, Supabase), automated snapshots and PITR are typically built in and enabled by default.

For RDS:

Automated daily snapshots are retained for the period you configure (1-35 days)
PITR is enabled automatically when backups are on, allowing restore to any second within the retention window
Cross-region replication of automated backups provides geographic redundancy

Verify that automated backups are actually enabled and that the retention period matches your recovery requirements. It is common to discover that backups were disabled to reduce costs on a dev instance that later became production.

Scheduling and Retention

A reasonable production schedule for pg_dump backups (as a complement to managed snapshots or WAL archiving):

Hourly: PITR via WAL archiving (if self-managing PostgreSQL)
Daily: Full pg_dump, retained for 14 days
Weekly: Full pg_dump, retained for 90 days
Monthly: Full pg_dump, retained for 1 year (or per your compliance requirements)

Store at least one copy offsite or in a different cloud region. Same-region storage does not protect against regional outages or account-level incidents.

What to Document Alongside Your Backups

Backup files alone are not enough. Document and store securely:

The database host, port, database name, and credentials needed to restore
Any encryption keys used (GPG key IDs, KMS key ARNs). Without the decryption key, an encrypted backup is useless
The Keycloak version running at the time of the backup (schema migrations may make it impossible to restore a newer dump into an older version)
The PostgreSQL version (major version restores across versions require pg_upgrade)

Realm Export: What It Covers and What It Does Not

The Keycloak export/import documentation explains the kc.sh export command in detail. Understanding its scope prevents dangerous misconceptions.

What kc.sh export captures

# Export a single realm to a file
kc.sh export --dir /tmp/export --realm my-realm

# Export all realms
kc.sh export --dir /tmp/export

# Export a realm including users (runs against a running server)
kc.sh export 
  --dir /tmp/export 
  --realm my-realm 
  --users realm_file

A realm export captures:

Realm-level settings (token lifespans, branding, security policies)
Client definitions (client IDs, redirect URIs, client scopes, mappers)
Identity provider configurations (but not IdP secrets, which are masked)
Authentication flow configurations
Roles and groups (definitions, not mappings; mappings require user export)
User definitions when --users realm_file or --users different_files is specified

What kc.sh export does NOT capture

This is where teams get into trouble:

Users are omitted by default without the --users flag. Even with --users realm_file, hashed passwords and OTP secrets may be excluded depending on the Keycloak version and configuration.
The export is not transactionally consistent with a live database. If the export runs over several minutes while users are logging in, the export represents different points in time for different tables.
Realm signing keys are not exported in a form usable for direct restoration. Existing tokens signed with the old key will become invalid if you import into a new realm.
Sessions are not included. All active sessions will be lost after a restore-from-export.
IdP client secrets and broker secrets are masked in the export JSON.
Events (admin and user login history) are not included.

The right uses for realm export

Realm export is genuinely valuable for:

Configuration-as-code: Committing realm configuration to Git so changes are tracked and reviewable. See our detailed guide on Keycloak realm export and import strategy for patterns on managing this workflow.
Environment seeding: Creating a staging or dev realm that mirrors production configuration (minus live user data).
Partial migration: Moving realm configuration from one Keycloak instance to another, for example during a version upgrade where you want a clean schema.
Auditing configuration drift: Exporting periodically and diffing against a known-good baseline.

Use realm export as a configuration management tool. Use database backups for disaster recovery.

A Tested Restore Procedure

A backup that has never been tested is a backup you cannot rely on. The procedure below should be run as a drill at least quarterly, and always before a major Keycloak upgrade or infrastructure change.

Step 1: Identify the restore target

Before starting, confirm:

Which backup you are restoring from (timestamp, location, format)
The Keycloak version that was running when the backup was taken
Whether the PostgreSQL major version matches the backup

Step 2: Stop or standby Keycloak

Restoring a database while Keycloak is actively reading and writing will cause data corruption or a failed restore. Stop Keycloak first.

# Kubernetes
kubectl scale deployment keycloak --replicas=0 -n auth

# Docker Compose
docker compose stop keycloak

# systemd
systemctl stop keycloak

If you are running a hot-standby for high availability, take the active nodes out of the load balancer before stopping them to drain in-flight requests cleanly.

Step 3: Restore the database

For a pg_dump custom-format backup:

# Drop and recreate the database (on the target PostgreSQL server)
psql --host=db-host --username=postgres <<EOF
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'keycloak' AND pid <> pg_backend_pid();
DROP DATABASE IF EXISTS keycloak;
CREATE DATABASE keycloak OWNER keycloak;
EOF

# Restore from the dump
pg_restore 
  --host=db-host 
  --username=keycloak 
  --dbname=keycloak 
  --no-owner 
  --no-privileges 
  --jobs=4 
  /backups/keycloak-20260721-090000.dump

For a managed snapshot (RDS, Cloud SQL), use the cloud console or CLI restore workflow. The restored instance will be a new database endpoint, so update your Keycloak KC_DB_URL environment variable before starting Keycloak.

For PITR, specify the target time in your recovery configuration. On RDS:

aws rds restore-db-instance-to-point-in-time 
  --source-db-instance-identifier keycloak-prod 
  --target-db-instance-identifier keycloak-restored 
  --restore-time 2026-07-21T08:55:00Z

Step 4: Start Keycloak and verify

# Start Keycloak
kubectl scale deployment keycloak --replicas=2 -n auth

# Or
docker compose start keycloak

Verify:

Keycloak starts without errors (kc.sh --log-level=INFO output should show startup completed)
Admin console is accessible and realms are present
A test user can authenticate successfully (use a staging test account)
Client configurations are intact (spot-check a critical client)
Check the Keycloak logs for schema-related errors, which would indicate a version mismatch

# Check startup logs for errors
kubectl logs -l app=keycloak -n auth --since=5m | grep -E "ERROR|WARN|started in"

Step 5: Run smoke tests and re-enable traffic

Before directing production traffic back to Keycloak, run your standard authentication smoke tests. Only then re-add Keycloak to the load balancer or re-enable DNS.

Document the restore. Record the backup timestamp, the wall clock time for each step, any issues encountered, and the total time from incident to recovery. This data drives improvements to your RPO and RTO targets.

Signing Keys and Secrets

Keycloak’s realm signing keys (RSA, EC, HMAC) are stored in the database under the COMPONENT and COMPONENT_CONFIG tables. When you restore the database, these keys are restored along with everything else, so no separate key backup is needed if you are doing database-level backups.

What you do need to consider:

If you are migrating to a new realm (using realm import rather than DB restore), signing keys will be regenerated. All existing JWTs signed with the old key will fail validation until they expire. Plan a brief downtime window or implement a key rotation strategy to handle this.
If your PostgreSQL data at rest is encrypted (via Transparent Data Encryption on managed services, or LUKS on self-managed), document the encryption key location and rotation schedule. A restored database that cannot be decrypted is worthless.
Admin credentials (the bootstrap admin account password) are stored in the database as Keycloak credentials. They are included in a database restore. Separate credential stores (Vault, AWS Secrets Manager) should back up their own data independently of the Keycloak DB backup.

For more on how Keycloak handles database connections and the implications for failover, see our guide on Keycloak DB connection pool exhaustion.

RPO and RTO: Setting Your Targets

Recovery Point Objective (RPO) is the maximum acceptable data loss measured in time. If your RPO is 1 hour, you need backups at least every hour.

Recovery Time Objective (RTO) is the maximum acceptable downtime from incident to recovery. If your RTO is 30 minutes, your restore procedure (including detection time) must complete within 30 minutes.

A realistic starting point for production Keycloak:

Scenario	Suggested RPO	Suggested RTO
Non-critical internal SSO	24 hours	4 hours
Customer-facing SaaS auth	1 hour	30 minutes
Financial services / healthcare	5-15 minutes	15 minutes
Zero-downtime requirement	Near-zero (PITR + standby)	< 5 minutes (failover)

Achieving low RTO requires testing your restore procedure. A restore that takes 2 hours to complete in practice will blow through a 30-minute RTO target no matter how good your backups are. When a cold restore cannot hit your RTO, you need a failover plan instead. The next section walks through one.

Disaster Recovery and Failover Runbook

A backup restore answers “how do I get my data back.” A failover runbook answers a tighter question: “the primary site is gone, how do I have authentication working somewhere else in minutes.” The two are related but not the same. Backup and restore is the foundation; the failover runbook below is what you reach for when a cold restore is too slow to meet your RTO.

First, a quick distinction that trips teams up. High availability (HA) keeps Keycloak running through expected failures like a node crash or a rolling upgrade, usually with multiple nodes behind a load balancer sharing state via Infinispan. HA does not protect you from losing a whole region or from a corrupted database. Disaster recovery (DR) is the rehearsed process for exactly those catastrophic cases. Your backups are the raw material; the runbook is how you turn them into a running service.

What the failover assumes

This runbook assumes you already have the building blocks covered earlier in this guide: a PostgreSQL standby kept current through streaming replication or WAL shipping (see the “pg_basebackup and WAL Archiving” section above), signing keys safe inside that replicated database (see “Signing Keys and Secrets” above), and a stateless Keycloak deployment you can stand up from a container image at the recovery site. The application layer holds nothing durable, so you can discard the running pods and redeploy from your registry.

One expectation to set with your product and support teams before an incident, not during one: active user sessions usually do not survive a failover. Infinispan keeps session data in memory, and that cache starts empty at the recovery site. Tokens that are still valid keep working until they expire, but refresh requests fail because the server-side session record is gone, so users get prompted to log in again. If session continuity is a hard requirement, you have to externalize Infinispan to a replicated remote cache, which adds its own operational and DR overhead.

The six-step failover

The clock on your RTO starts the moment you declare the incident, so the first step is deciding fast and the rest is execution.

Step 1: Detect and declare. Confirm the outage is real, not a transient blip. Check that Keycloak’s /health/ready endpoint has been unresponsive for more than a couple of minutes, that the primary database is unreachable (not just one node), and that the underlying region or data center is genuinely degraded. Then declare the DR event, page the on-call team, open an incident channel, and record the declaration timestamp.

# Keycloak 26 built-in readiness endpoint
curl -sf https://keycloak.example.com/health/ready
# Returns: {"status":"UP","checks":[...]}

Step 2: Promote the standby database. Promote your PostgreSQL replica to primary. On a self-managed physical standby:

pg_ctl promote -D /var/lib/postgresql/data

# Confirm it is now primary (expect: f)
psql -c "SELECT pg_is_in_recovery();"

On a managed service, use the provider’s failover command and wait until the database accepts write connections before moving on.

Step 3: Point Keycloak at the recovered database. If you run a warm standby at the recovery site, update its database URL and restart so it picks up the new primary.

# Kubernetes: patch the DB URL secret, then roll the pods
kubectl patch secret keycloak-db-secret 
  -p '{"stringData":{"KC_DB_URL":"jdbc:postgresql://recovery-db-host:5432/keycloak"}}'
kubectl rollout restart deployment/keycloak

If you are deploying Keycloak fresh, apply your infrastructure-as-code (Helm, Terraform, or Compose). Our Keycloak production-ready checklist covers the deployment baseline you should already have scripted.

Step 4: Validate before you cut traffic. This is the step people skip under pressure, and it is the one that catches the silent failures. Verify readiness, that the realm is intact, and that signing keys are present.

# Readiness
curl -sf https://recovery-keycloak.internal/health/ready

# Realm is present and the issuer matches production exactly
curl -sf https://recovery-keycloak.internal/realms/your-realm/.well-known/openid-configuration 
  | jq '.issuer'

# At least one signing key is published
curl -sf https://recovery-keycloak.internal/realms/your-realm/protocol/openid-connect/certs 
  | jq '.keys | length'

The issuer URL has to match production character for character. If Keycloak came up at the recovery site with a different hostname, every token validation against your production issuer will fail even though Keycloak looks healthy.

Step 5: Cut traffic over. Route users to the recovery site, then watch your application’s authentication error rates for the next few minutes. Two common mechanisms:

DNS failover: keep your Keycloak domain’s TTL low (60 seconds) ahead of time, then repoint the record. With a low TTL most resolvers pick up the change within a minute or two.
Health-checked global load balancer: Route 53, Cloudflare Load Balancing, or GCP can reroute automatically when a health check against /health/ready fails. This removes human latency, which under pressure is the slowest part of any runbook.

Step 6: Stabilize and review. Once authentication is flowing, document the timeline (declaration, promotion, restart, traffic cut, restored), find the root cause, file the work to rebuild the primary and re-establish replication, and schedule a blameless post-mortem within a couple of days.

Single region versus multi-region

Most teams should aim for active-passive: one region serves traffic, a second is a warm or hot standby with a consistent issuer URL. That is the model the runbook above describes, and a tested sub-30-minute RTO is a realistic target for it. Active-active, where both regions serve simultaneously, is meaningfully harder. It needs Infinispan cross-site replication, careful database conflict handling, and a shared issuer strategy, so reach for it only when a latency SLA genuinely forces your hand.

Test the runbook, or it is just a hypothesis

A runbook nobody has executed is a guess. Run DR game days where you actually promote the database, deploy Keycloak at the recovery site, and validate token issuance end to end. Quarterly is a sensible cadence, plus a re-run after any infrastructure change and after every key rotation (to confirm the replica carries the new key material). Track real RTO from each exercise: if your target is 30 minutes but every drill lands at 55, fix the tooling or move the target.

# Game-day smoke test: get a token from the recovery site, check the issuer
TOKEN=$(curl -s -X POST 
  "https://keycloak.example.com/realms/your-realm/protocol/openid-connect/token" 
  -d "client_id=test-client&client_secret=secret&grant_type=client_credentials" 
  | jq -r '.access_token')

echo "$TOKEN" | cut -d. -f2 | base64 -d 2>/dev/null | jq '.iss'
# Must match your production issuer URL

For Infinispan clustering and key-management practices that underpin a healthy recovery, see the top 7 Keycloak cluster configuration best practices.

Database Performance and Backup Interaction

Running pg_dump against a busy database has a performance cost. The dump process takes a transaction snapshot and holds it open for the duration of the dump, which can interfere with PostgreSQL’s autovacuum and table bloat management on high-write tables like sessions and events.

Best practices to minimize impact:

Run pg_dump backups during off-peak hours if your traffic has a clear daily pattern
Run dumps against a read replica rather than the primary, especially for large databases
Monitor autovacuum during the dump window. If you see bloat accumulating, consider increasing autovacuum_cost_delay outside the backup window and relaxing it after

Our Keycloak PostgreSQL database tuning guide covers autovacuum configuration, session table management, and query performance in depth.

Frequently Asked Questions

How do I back up Keycloak?

Back up the PostgreSQL database that Keycloak runs against. Use pg_dump for scheduled logical backups, pg_basebackup with WAL archiving for physical backups with PITR capability, or your managed cloud database provider’s automated snapshot and PITR feature. The database contains all realm configuration, users, credentials, and signing keys. Supplement database backups with periodic realm exports for configuration-as-code purposes, but do not treat realm exports as your primary disaster recovery mechanism.

Is a realm export a full Keycloak backup?

No. A realm export (kc.sh export) captures realm configuration and optionally user records, but it omits user credentials (hashed passwords, OTP secrets) by default, is not transactionally consistent with the live database, excludes active sessions, masks identity provider secrets, and does not include realm signing keys in a restorable form. A database backup is the only complete, restorable representation of Keycloak’s state.

How do I restore Keycloak from a backup?

Stop all Keycloak instances to prevent write conflicts, then restore the PostgreSQL database using pg_restore (for custom-format dumps) or your managed provider’s restore workflow. After the restore, start Keycloak and verify that realms, users, and client configurations are intact, and that a test authentication succeeds. Always test this procedure before you need it in a real incident.

What happens to existing sessions after a database restore?

Sessions that existed at the time of the backup are restored along with the database. Sessions created between the backup timestamp and the incident are lost, and that gap is your RPO. Users whose sessions were lost will be required to authenticate again. If you are using offline tokens for applications that need long-lived access, those tokens are also restored to the backup state, meaning tokens issued after the backup timestamp will be invalid.

Do I need to back up Keycloak’s signing keys separately?

No. Keycloak’s realm signing keys are stored in the database (in the COMPONENT and COMPONENT_CONFIG tables) and are automatically included in any database backup. When you restore the database, the signing keys are restored with it. Tokens issued before the backup will continue to validate correctly after the restore. The only scenario where you need to think about signing keys separately is when you are seeding a new realm from a realm export (not a database restore), in which case new keys will be generated and existing tokens will fail validation.

What is the difference between Keycloak HA and DR?

High availability keeps Keycloak running through expected, single-node failures without operator involvement, typically by running multiple pods behind a load balancer with Infinispan clustering for shared session state. Disaster recovery is the planned response to catastrophic failures, like a full region outage or database corruption, that HA cannot absorb. HA reduces how often you have outages; DR reduces how long they last when HA is not enough. You want both.

Are Keycloak sessions preserved during a DR failover?

In most deployments, no. Infinispan stores session data in memory, and that cache starts empty at the recovery site. Users whose access tokens are still valid keep passing token validation, but refresh requests fail because the server-side session record no longer exists, so users get prompted to re-authenticate. If session continuity across a DR event is a real requirement, you must externalize Infinispan to a replicated remote cache.

How do I minimize RTO for Keycloak?

The biggest wins come from automating the failover steps. Automate database promotion (or use a managed service with auto-failover), keep Keycloak pre-deployed at the recovery site in a standby state, and automate DNS or load-balancer switching via your provider’s API. Each manual step adds operator latency, often several minutes of cognitive overhead under pressure. Aim to reduce the manual decisions to exactly one: declaring the incident.

Closing Thoughts

A Keycloak backup strategy has two parts: the backups themselves, and the confidence that a restore works. The first part is straightforward. Automate pg_dump or enable managed snapshots, encrypt the output, ship it offsite, and set a retention policy. The second part requires discipline: schedule restore drills, time them against your RTO targets, and fix anything that slows the procedure down.

Realm exports are a valuable complement for managing configuration as code, seeding environments, and auditing drift. They are not a substitute for database backups.

If managing PostgreSQL backups, WAL archiving, and restore drills is not something you want to own operationally, Skycloak’s managed Keycloak hosting handles automated backups, PITR, and infrastructure-level disaster recovery as part of the managed service, so you can focus on building your application rather than maintaining your identity infrastructure.