Keycloak on Kubernetes: Production Deployment Guide
Last updated: March 2026
Running Keycloak in production on Kubernetes requires more than just deploying a container. You need to handle database connectivity, TLS termination, clustering, session replication, autoscaling, health checks, and graceful shutdowns. Getting any of these wrong leads to authentication outages, which affect every application that depends on Keycloak.
This guide provides a production-ready Kubernetes deployment for Keycloak, covering every component with complete YAML manifests. If you have already deployed Keycloak with ArgoCD, this guide builds on those concepts with production-hardening steps. See our ArgoCD deployment guide for GitOps-based delivery.
Architecture Overview
A production Keycloak deployment on Kubernetes consists of:
- Keycloak pods: Multiple replicas running the Keycloak server
- PostgreSQL: The backing database (managed or self-hosted)
- Ingress controller: TLS termination and routing
- Infinispan: Distributed cache for session replication between pods
- Monitoring: Health checks, readiness probes, and metrics
The goal is a deployment that handles failover gracefully, scales with demand, and survives node failures without authentication downtime.
Namespace and Prerequisites
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: keycloak
labels:
app.kubernetes.io/part-of: identity
Create the namespace and required secrets:
kubectl apply -f namespace.yaml
# Create database credentials
kubectl create secret generic keycloak-db-credentials
--namespace keycloak
--from-literal=username=keycloak
--from-literal=password='your-strong-password'
# Create admin credentials
kubectl create secret generic keycloak-admin-credentials
--namespace keycloak
--from-literal=username=admin
--from-literal=password='your-admin-password'
# Create TLS certificate (or use cert-manager)
kubectl create secret tls keycloak-tls
--namespace keycloak
--cert=tls.crt
--key=tls.key
PostgreSQL with CloudNativePG
For production, use a PostgreSQL operator to manage your database. CloudNativePG is a widely adopted option:
# postgres-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db
namespace: keycloak
spec:
instances: 3
primaryUpdateStrategy: unsupervised
storage:
size: 20Gi
storageClass: gp3
postgresql:
parameters:
shared_buffers: "1GB"
effective_cache_size: "3GB"
work_mem: "16MB"
maintenance_work_mem: "256MB"
max_connections: "200"
checkpoint_completion_target: "0.9"
wal_buffers: "32MB"
max_wal_size: "2GB"
bootstrap:
initdb:
database: keycloak
owner: keycloak
secret:
name: keycloak-db-credentials
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
backup:
barmanObjectStore:
destinationPath: "s3://keycloak-backups/postgres/"
s3Credentials:
accessKeyId:
name: aws-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-credentials
key: SECRET_ACCESS_KEY
retentionPolicy: "30d"
monitoring:
enablePodMonitor: true
This creates a 3-node PostgreSQL cluster with automated failover, backups, and monitoring. For detailed PostgreSQL tuning recommendations, see our Keycloak database tuning guide.
Keycloak Deployment
Deployment vs. StatefulSet
Use a Deployment for Keycloak, not a StatefulSet. Keycloak pods are stateless — sessions are replicated through Infinispan, and persistent data lives in PostgreSQL. A Deployment gives you rolling updates and simpler scaling.
# keycloak-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: keycloak
namespace: keycloak
labels:
app.kubernetes.io/name: keycloak
app.kubernetes.io/component: server
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: keycloak
template:
metadata:
labels:
app.kubernetes.io/name: keycloak
app.kubernetes.io/component: server
spec:
serviceAccountName: keycloak
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- keycloak
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: keycloak
terminationGracePeriodSeconds: 60
containers:
- name: keycloak
image: quay.io/keycloak/keycloak:26.0
args:
- start
- --optimized
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: management
containerPort: 9000
protocol: TCP
env:
# Database
- name: KC_DB
value: postgres
- name: KC_DB_URL
value: jdbc:postgresql://keycloak-db-rw.keycloak.svc:5432/keycloak
- name: KC_DB_USERNAME
valueFrom:
secretKeyRef:
name: keycloak-db-credentials
key: username
- name: KC_DB_PASSWORD
valueFrom:
secretKeyRef:
name: keycloak-db-credentials
key: password
- name: KC_DB_POOL_MIN_SIZE
value: "10"
- name: KC_DB_POOL_MAX_SIZE
value: "50"
# HTTP / Proxy
- name: KC_HOSTNAME
value: auth.example.com
- name: KC_PROXY_HEADERS
value: xforwarded
- name: KC_HTTP_ENABLED
value: "true"
# Clustering
- name: KC_CACHE
value: ispn
- name: KC_CACHE_STACK
value: kubernetes
- name: JAVA_OPTS_KC_HEAP
value: "-XX:InitialRAMPercentage=50.0 -XX:MaxRAMPercentage=70.0"
# Health and metrics
- name: KC_HEALTH_ENABLED
value: "true"
- name: KC_METRICS_ENABLED
value: "true"
# Admin credentials (initial setup only)
- name: KC_BOOTSTRAP_ADMIN_USERNAME
valueFrom:
secretKeyRef:
name: keycloak-admin-credentials
key: username
- name: KC_BOOTSTRAP_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: keycloak-admin-credentials
key: password
# JGroups DNS discovery for Infinispan clustering
- name: jgroups.dns.query
value: keycloak-headless.keycloak.svc.cluster.local
readinessProbe:
httpGet:
path: /health/ready
port: management
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /health/live
port: management
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 5
startupProbe:
httpGet:
path: /health/started
port: management
initialDelaySeconds: 15
periodSeconds: 5
failureThreshold: 30
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2"
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
Key configuration decisions explained:
maxUnavailable: 0: Ensures at least the current number of pods are available during rolling updates. Combined withmaxSurge: 1, this means Kubernetes creates a new pod before terminating an old one.- Pod anti-affinity: Spreads Keycloak pods across different nodes to survive node failures.
- Topology spread constraints: Distributes pods across availability zones for zone-level resilience.
preStoplifecycle hook: The 10-second sleep allows the pod to be removed from the service endpoints before it starts shutting down, preventing dropped connections.- Separate management port (9000): Health and metrics endpoints run on a different port from application traffic, following Keycloak’s best practice.
Service and Headless Service
# keycloak-service.yaml
apiVersion: v1
kind: Service
metadata:
name: keycloak
namespace: keycloak
labels:
app.kubernetes.io/name: keycloak
spec:
type: ClusterIP
ports:
- name: http
port: 8080
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: keycloak
---
# Headless service for Infinispan DNS discovery
apiVersion: v1
kind: Service
metadata:
name: keycloak-headless
namespace: keycloak
labels:
app.kubernetes.io/name: keycloak
spec:
type: ClusterIP
clusterIP: None
ports:
- name: jgroups
port: 7800
targetPort: 7800
protocol: TCP
selector:
app.kubernetes.io/name: keycloak
publishNotReadyAddresses: true
The headless service is essential for Infinispan clustering. JGroups uses DNS discovery to find other Keycloak pods, and the headless service provides DNS records for each pod. Setting publishNotReadyAddresses: true ensures that pods can discover each other during startup.
Ingress with TLS
NGINX Ingress
# keycloak-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: keycloak
namespace: keycloak
annotations:
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "KC_INGRESS"
nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
nginx.ingress.kubernetes.io/configuration-snippet: |
more_set_headers "X-Frame-Options: SAMEORIGIN";
more_set_headers "X-Content-Type-Options: nosniff";
more_set_headers "X-XSS-Protection: 1; mode=block";
more_set_headers "Strict-Transport-Security: max-age=31536000; includeSubDomains";
spec:
ingressClassName: nginx
tls:
- hosts:
- auth.example.com
secretName: keycloak-tls
rules:
- host: auth.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: keycloak
port:
number: 8080
Key annotations:
proxy-buffer-size: 128k: Keycloak responses can include large headers (especially with many roles in tokens). Increasing the buffer prevents 502 errors.- Session affinity: While not strictly required (Infinispan handles session replication), session affinity reduces cross-pod lookups and improves latency.
Automated TLS with cert-manager
Instead of managing certificates manually, use cert-manager with Let’s Encrypt:
# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: keycloak-tls
namespace: keycloak
spec:
secretName: keycloak-tls
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
dnsNames:
- auth.example.com
Horizontal Pod Autoscaler
Scale Keycloak pods based on CPU utilization:
# keycloak-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: keycloak
namespace: keycloak
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: keycloak
minReplicas: 3
maxReplicas: 10
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 120
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
The stabilization windows prevent thrashing:
- Scale up: Wait 60 seconds before adding more pods, add up to 2 pods per minute
- Scale down: Wait 5 minutes before removing pods, remove only 1 pod per 2 minutes
This ensures the cluster does not oscillate rapidly during variable traffic patterns.
Pod Disruption Budget
Protect against voluntary disruptions (node drains, cluster upgrades):
# keycloak-pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: keycloak
namespace: keycloak
spec:
minAvailable: 2
selector:
matchLabels:
app.kubernetes.io/name: keycloak
This ensures at least 2 Keycloak pods are always running, even during node maintenance operations.
Infinispan Clustering
Keycloak uses Infinispan for distributed caching and session replication. In Kubernetes, DNS-based discovery is the simplest approach.
The configuration is handled through Keycloak’s environment variables:
- name: KC_CACHE
value: ispn
- name: KC_CACHE_STACK
value: kubernetes
- name: jgroups.dns.query
value: keycloak-headless.keycloak.svc.cluster.local
Verifying Cluster Formation
After deploying, verify that all pods have joined the Infinispan cluster:
# Check Keycloak logs for cluster join events
kubectl logs -n keycloak deployment/keycloak | grep -i "cluster"
# You should see messages like:
# Received new cluster view: [keycloak-xxx|2] (3) [keycloak-xxx, keycloak-yyy, keycloak-zzz]
If pods are not clustering, check:
- The headless service has
publishNotReadyAddresses: true - The DNS query matches the headless service name
- Port 7800 (JGroups) is not blocked by network policies
Custom Infinispan Cache Configuration
For high-traffic deployments, you can tune Infinispan’s cache settings by mounting a custom cache-ispn.xml:
# configmap with custom cache config
apiVersion: v1
kind: ConfigMap
metadata:
name: keycloak-cache-config
namespace: keycloak
data:
cache-ispn.xml: |
<infinispan>
<cache-container name="keycloak">
<transport lock-timeout="60000"/>
<distributed-cache name="sessions" owners="2">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="authenticationSessions" owners="2">
<expiration lifespan="-1"/>
</distributed-cache>
<distributed-cache name="offlineSessions" owners="1">
<expiration lifespan="-1"/>
</distributed-cache>
</cache-container>
</infinispan>
The owners parameter controls how many copies of each session exist across the cluster. Setting owners="2" means each session is replicated to 2 nodes, providing resilience against a single node failure. For more on session management strategies, see our session management feature.
RBAC and ServiceAccount
# keycloak-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: keycloak
namespace: keycloak
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: keycloak
namespace: keycloak
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: keycloak
namespace: keycloak
subjects:
- kind: ServiceAccount
name: keycloak
namespace: keycloak
roleRef:
kind: Role
name: keycloak
apiGroup: rbac.authorization.k8s.io
Keycloak needs get and list permissions on pods for DNS-PING cluster discovery.
Network Policies
Restrict network access to only what is needed:
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: keycloak
namespace: keycloak
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: keycloak
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from ingress controller
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ingress-nginx
ports:
- port: 8080
protocol: TCP
# Allow JGroups traffic between Keycloak pods
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: keycloak
ports:
- port: 7800
protocol: TCP
egress:
# Allow DNS
- to:
- namespaceSelector: {}
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP
# Allow PostgreSQL
- to:
- podSelector:
matchLabels:
cnpg.io/cluster: keycloak-db
ports:
- port: 5432
protocol: TCP
# Allow HTTPS for identity provider communication
- to:
- ipBlock:
cidr: 0.0.0.0/0
ports:
- port: 443
protocol: TCP
Monitoring with Prometheus
Keycloak exposes Prometheus metrics on the management port:
# service-monitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: keycloak
namespace: keycloak
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: keycloak
endpoints:
- port: management
path: /metrics
interval: 30s
Key metrics to alert on:
keycloak_login_attempts_total— authentication volumekeycloak_failed_login_attempts_total— failed logins (brute force detection)keycloak_request_duration_seconds— API latency- JVM metrics (
jvm_memory_used_bytes,jvm_gc_pause_seconds) — resource pressure
For built-in monitoring and alerting, Skycloak provides insights dashboards that track these metrics without additional setup.
Deploying Everything
Apply all manifests in order:
kubectl apply -f namespace.yaml
kubectl apply -f keycloak-rbac.yaml
kubectl apply -f postgres-cluster.yaml
# Wait for PostgreSQL to be ready
kubectl wait --for=condition=ready cluster/keycloak-db -n keycloak --timeout=300s
kubectl apply -f keycloak-deployment.yaml
kubectl apply -f keycloak-service.yaml
kubectl apply -f keycloak-ingress.yaml
kubectl apply -f keycloak-hpa.yaml
kubectl apply -f keycloak-pdb.yaml
kubectl apply -f network-policy.yaml
Verifying the Deployment
# Check pod status
kubectl get pods -n keycloak
# Verify cluster formation
kubectl logs -n keycloak -l app.kubernetes.io/name=keycloak --tail=50 | grep "cluster view"
# Test the health endpoint
kubectl port-forward -n keycloak svc/keycloak 8080:8080
curl http://localhost:8080/health/ready
Operational Considerations
Rolling Updates
When updating Keycloak versions, the rolling update strategy with maxUnavailable: 0 ensures zero downtime. However, be aware that Keycloak may need database migrations during major version upgrades. These run automatically on the first pod that starts.
Backup and Restore
With CloudNativePG, backups are automated. For disaster recovery, document and test your restore procedure:
# Restore from backup
kubectl apply -f - <<EOF
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: keycloak-db-restored
namespace: keycloak
spec:
instances: 3
bootstrap:
recovery:
source: keycloak-db
externalClusters:
- name: keycloak-db
barmanObjectStore:
destinationPath: "s3://keycloak-backups/postgres/"
s3Credentials:
accessKeyId:
name: aws-credentials
key: ACCESS_KEY_ID
secretAccessKey:
name: aws-credentials
key: SECRET_ACCESS_KEY
EOF
Managing Keycloak Configuration as Code
Once your cluster is running, manage realm configuration with Terraform or Pulumi for version-controlled, repeatable configuration.
Conclusion
A production Keycloak deployment on Kubernetes requires careful attention to database management, clustering, TLS, scaling, and resilience. The manifests in this guide provide a foundation that handles these concerns, but you should adapt them to your specific requirements, traffic patterns, and compliance needs. The Keycloak guides on running in containers cover additional configuration options for containerized deployments.
For organizations that want production-grade Keycloak without managing Kubernetes infrastructure, Skycloak’s managed hosting handles all of this automatically, including database tuning, high availability, TLS, and 24/7 monitoring. Check our pricing to see what fits your scale.
Ready to simplify your authentication?
Deploy production-ready Keycloak in minutes. Unlimited users, flat pricing, no SSO tax.