Scaling Keycloak: Clustering, Caching, and Load Testing for Production

Guilliano Molaire Guilliano Molaire Updated March 16, 2026 12 min read

Last updated: March 2026


Introduction

Authentication infrastructure is the critical path for every user interaction. Every login, every token refresh, every API call that validates a JWT — all of it flows through your identity provider. When Keycloak goes down or slows to a crawl, your entire application ecosystem grinds to a halt. Users can’t sign in, services can’t authorize requests, and your support team starts fielding a flood of tickets.

Yet many teams treat Keycloak as a “set it and forget it” service. They spin up a single instance during development, push it to production with minimal changes, and hope for the best. This works until it doesn’t — and it usually stops working at the worst possible time: during a product launch, a marketing campaign spike, or Black Friday traffic.

This guide covers everything you need to scale Keycloak for production: clustering architecture, Infinispan cache tuning, database optimization, load testing with real tools, and monitoring strategies that give you confidence your auth infrastructure can handle whatever comes next.

Keycloak Architecture for Scale

Before you can scale Keycloak effectively, you need to understand how its components interact under load.

Stateless Nodes with Shared State

Modern Keycloak (built on Quarkus) is designed to run as a cluster of stateless nodes. Each node runs the same application code and connects to two shared backends:

  • Relational database (PostgreSQL, MySQL, or MariaDB) for persistent data: users, realms, clients, roles, and configuration
  • Infinispan distributed cache for transient state: active sessions, authentication flows in progress, and login failures

This separation is what makes horizontal scaling possible. Any Keycloak node can handle any request because the shared state lives outside the application process.

Session Management via Infinispan

When a user logs in, Keycloak creates a session object that tracks their authentication state. These sessions live in Infinispan’s distributed cache, not in the database. This is a deliberate design choice — session lookups happen on every authenticated request, and cache reads are orders of magnitude faster than database queries.

Infinispan distributes session data across the cluster using consistent hashing. Each session has a primary owner and (optionally) backup owners on different nodes. If a node goes down, the backup owner promotes the session data, and users don’t notice a thing.

The Database Bottleneck

While sessions live in cache, Keycloak still hits the database frequently: user lookups during login, client validation, role resolution, and realm configuration loading. Under heavy load, the database becomes the first bottleneck most teams encounter.

The good news is that Keycloak caches most configuration data aggressively. Realm and client metadata is loaded once and cached. User lookups are the primary source of database pressure during authentication flows.

Horizontal vs Vertical Scaling

You have two levers for increasing Keycloak capacity: bigger nodes or more nodes.

Vertical Scaling: Quick Wins with Limits

Adding more CPU and RAM to your Keycloak instances is the fastest path to better performance. Keycloak’s token signing operations are CPU-bound, and giving the JVM more heap reduces garbage collection pressure.

Practical limits for vertical scaling:

  • CPU: Beyond 8 cores, Keycloak’s internal locking and Infinispan coordination start to limit single-node throughput
  • Memory: The JVM heap should typically be 512MB to 2GB. Going higher rarely helps and can cause longer GC pauses
  • Cost: Doubling instance size often more than doubles cost on cloud providers

Vertical scaling is a reasonable first step, but it hits a ceiling quickly. A single Keycloak node, well-tuned, can handle roughly 500-1,000 logins per second depending on the authentication flow complexity.

Horizontal Scaling: The Path to High Availability

Adding more Keycloak nodes is the preferred approach for production. It provides both higher throughput and fault tolerance — if one node fails, the others continue serving requests.

Key considerations for horizontal scaling:

  • Load balancer: Required in front of Keycloak nodes. Must support sticky sessions (recommended) or have properly configured distributed caching
  • Session affinity: While Keycloak can work without sticky sessions, they reduce cross-node cache lookups and improve performance
  • Cluster discovery: Nodes need to find each other. In Kubernetes, use DNS_PING or KUBE_PING. On bare metal or VMs, use JDBC_PING or TCPPING

Kubernetes HPA for Auto-Scaling

If you’re running Keycloak on Kubernetes, Horizontal Pod Autoscaler (HPA) can automatically adjust the number of replicas based on load:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: keycloak
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: keycloak
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_server_requests_seconds_count
        target:
          type: AverageValue
          averageValue: "100"

Set minReplicas to at least 3 for production — this ensures quorum for Infinispan’s distributed cache and tolerates a single node failure without service degradation.

Infinispan Cache Configuration

Infinispan is the backbone of Keycloak’s performance at scale. Tuning it correctly is the difference between a system that handles spikes gracefully and one that buckles under pressure.

Key Caches

Keycloak uses several Infinispan caches, each with different access patterns:

Cache Purpose Access Pattern
sessions Active user SSO sessions Very frequent reads, moderate writes
authenticationSessions In-progress login flows Write-heavy, short-lived entries
offlineSessions Offline tokens / remember me Infrequent reads, rare writes
clientSessions Per-client session data Frequent reads alongside sessions
actionTokens Password reset, email verification Write once, read once, expire
loginFailures Brute force detection counters Write-heavy during attacks

Cache Owners and Replication

The owners setting controls how many copies of each cache entry exist across the cluster. The default of 1 means each entry lives on exactly one node — if that node dies, the session is lost and the user must re-authenticate.

For production, set owners to 2:

<!-- cache-ispn.xml tuning -->
<distributed-cache name="sessions" owners="2">
  <memory max-count="10000" when-full="REMOVE"/>
</distributed-cache>

This ensures every session has a backup copy on a different node. Setting owners to 3 or higher is rarely necessary and increases write latency since each session update must replicate to more nodes.

For authenticationSessions, keep owners at 1 or 2. These entries are short-lived (they exist only during the login flow, typically a few seconds to minutes), so losing them during a node failure simply means the user retries the login.

Remote Infinispan for Large Deployments

For deployments with hundreds of thousands of concurrent sessions, you can externalize Infinispan into its own dedicated cluster. This separates cache scaling from application scaling and allows you to:

  • Scale cache memory independently of Keycloak JVM heap
  • Use Infinispan’s persistence to survive full cluster restarts
  • Share session state across multiple Keycloak clusters in different regions

Keycloak 24+ supports remote Infinispan stores natively. Configure it via:

KC_CACHE=ispn
KC_CACHE_REMOTE_HOST=infinispan.example.com
KC_CACHE_REMOTE_PORT=11222
KC_CACHE_REMOTE_USERNAME=keycloak
KC_CACHE_REMOTE_PASSWORD=changeme

Database Optimization

The database is Keycloak’s persistent brain. Optimizing it directly impacts login latency and overall system throughput.

Connection Pooling

Keycloak uses Agroal (via Quarkus) for database connection pooling. The default pool size is often too small for high-traffic deployments. Tune these settings based on your expected concurrency:

KC_DB_POOL_INITIAL_SIZE=10
KC_DB_POOL_MIN_SIZE=10
KC_DB_POOL_MAX_SIZE=100

A good rule of thumb: set KC_DB_POOL_MAX_SIZE to (expected peak logins per second / number of Keycloak nodes) * 2. If you expect 1,000 logins/second across 5 nodes, each node needs roughly 400 connections at peak — but that’s often more than your database can handle. Balance pool size against your database’s max_connections setting.

Read Replicas for User Lookups

If your deployment has a large user base (100,000+ users), consider routing read-heavy queries to database replicas. User lookups during authentication are the most common read operation, and they can be safely served from a replica with minimal lag tolerance.

PostgreSQL and MySQL both support read replicas. Configure Keycloak’s datasource to use the primary for writes and a replica for reads at the infrastructure level (using PgBouncer, ProxySQL, or your cloud provider’s proxy).

Index Optimization

Keycloak creates database indexes during initial setup, but large user bases benefit from additional indexes. Key tables to monitor:

  • USER_ENTITY — add composite indexes on frequently queried attributes
  • CREDENTIAL — index on USER_ID for faster credential lookups
  • USER_ATTRIBUTE — index on NAME and VALUE columns if you query custom attributes

Run EXPLAIN ANALYZE on slow queries captured in your database’s slow query log to identify missing indexes.

PostgreSQL-Specific Tuning

If you’re using PostgreSQL (the recommended database for Keycloak), these settings have the biggest impact:

shared_buffers = 25% of available RAM
effective_cache_size = 75% of available RAM
work_mem = 64MB
maintenance_work_mem = 512MB
random_page_cost = 1.1  # for SSD storage

Load Testing with JMeter

You can’t scale what you haven’t measured. Load testing your Keycloak deployment reveals bottlenecks before your users find them.

Building a JMeter Test Plan

Apache JMeter is the most established tool for load testing Keycloak. Create a test plan that exercises the flows your application actually uses:

Key flows to test:

  1. Authorization Code flow — the most common flow for web applications. Simulates the full redirect-based login including form submission
  2. Token refresh — exercises the token endpoint with refresh tokens. Important because refresh operations happen frequently in long-lived sessions
  3. Client Credentials — service-to-service authentication. Typically the highest throughput flow since it’s a single HTTP request
  4. User registration — write-heavy operation that stresses both the database and email delivery

JMeter Thread Group Configuration

For a realistic load test, ramp up gradually and sustain peak load for at least 10 minutes:

Parameter Value Rationale
Ramp-up Period 120 seconds Gradual increase prevents connection storms
Peak Threads 500 Simulates 500 concurrent users
Hold Duration 600 seconds 10 minutes at peak reveals memory leaks and GC issues
Loop Count Forever Run until duration expires

Add these listeners to your test plan:

  • Summary Report for aggregate metrics
  • Response Time Over Time for latency trends
  • Transactions Per Second for throughput visualization

Metrics to Capture

Focus on these metrics during load tests:

  • p50 response time: Typical user experience. Should be under 200ms for token endpoints
  • p99 response time: Worst-case user experience. Should stay under 2 seconds
  • Throughput: Tokens issued per second at sustained peak
  • Error rate: Should be 0% under normal load. Any errors indicate a bottleneck
  • Database connection wait time: If this grows, your pool is too small

Load Testing with k6

For teams that prefer code-over-GUI, k6 by Grafana Labs offers a developer-friendly alternative with excellent scripting support and built-in metrics.

k6 Test Script for Keycloak

Here’s a production-ready k6 script that tests the Client Credentials flow with realistic load patterns:

import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 500 },
    { duration: '2m', target: 0 },
  ],
};

export default function () {
  const res = http.post(`${__ENV.KC_URL}/realms/test/protocol/openid-connect/token`, {
    grant_type: 'client_credentials',
    client_id: 'load-test',
    client_secret: __ENV.CLIENT_SECRET,
  });
  check(res, { 'token issued': (r) => r.status === 200 });
}

Run it with:

k6 run --env KC_URL=https://keycloak.example.com --env CLIENT_SECRET=your-secret load-test.js

Testing the Authorization Code Flow

The Authorization Code flow is more complex to test because it involves browser redirects and form submissions. Here’s a k6 approach:

import http from 'k6/http';
import { check, group } from 'k6';

export default function () {
  group('Authorization Code Flow', () => {
    // Step 1: Hit the authorization endpoint
    const authRes = http.get(
      `${__ENV.KC_URL}/realms/test/protocol/openid-connect/auth` +
      `?client_id=test-app&response_type=code&scope=openid&redirect_uri=http://localhost/callback`,
      { redirects: 0 }
    );

    // Step 2: Submit login form
    const loginRes = http.post(authRes.headers['Location'], {
      username: `testuser_${__VU}`,
      password: 'password',
    }, { redirects: 0 });

    // Step 3: Exchange code for token
    const code = new URL(loginRes.headers['Location']).searchParams.get('code');
    const tokenRes = http.post(
      `${__ENV.KC_URL}/realms/test/protocol/openid-connect/token`,
      {
        grant_type: 'authorization_code',
        code: code,
        client_id: 'test-app',
        redirect_uri: 'http://localhost/callback',
      }
    );

    check(tokenRes, { 'login successful': (r) => r.status === 200 });
  });
}

Interpreting k6 Results

k6 outputs a clean summary after each run. Key metrics to watch:

  • http_req_duration — overall request latency (look at p95 and p99)
  • http_req_failed — percentage of failed requests
  • iterations — total completed test iterations
  • vus_max — peak concurrent virtual users

If http_req_duration{p(95)} exceeds 1 second during peak load, you have a scaling problem to address before production.

Monitoring and Dashboards

Scaling without monitoring is flying blind. You need real-time visibility into Keycloak’s behavior to catch problems before they become outages.

Enabling the Metrics Endpoint

Keycloak exposes Prometheus-compatible metrics when enabled:

KC_METRICS_ENABLED=true
KC_HEALTH_ENABLED=true

This exposes /metrics (Prometheus format) and /health (readiness/liveness probes) on the management port (default 9000).

Key Prometheus Metrics

Configure your Prometheus instance to scrape these essential metrics:

Metric What It Tells You
keycloak_request_duration_seconds Latency per endpoint (login, token, userinfo)
vendor_cache_manager_default_cache_sessions_statistics_stores Session creation rate
vendor_cache_manager_default_cache_sessions_statistics_evictions Cache pressure indicator
jvm_memory_used_bytes JVM heap consumption
agroal_active_count Active database connections
agroal_awaiting_count Threads waiting for a DB connection

Grafana Dashboard Essentials

Build a Grafana dashboard with these panels for production monitoring and insights:

  1. Login Success Raterate(keycloak_request_duration_seconds_count{method="POST",uri="/realms/{realm}/protocol/openid-connect/token"}[5m])
  2. p99 Login Latency — histogram quantile of keycloak_request_duration_seconds
  3. Active Sessions — current session count from cache statistics
  4. Database Connection Pool — active vs. awaiting connections
  5. JVM Memory — heap used vs. committed with GC event overlay
  6. Cache Evictions — rate of evictions per cache (sessions, authenticationSessions)

Alerting Thresholds

Set up alerts before you need them:

Alert Threshold Severity
p99 login latency > 2 seconds for 5 minutes Warning
p99 login latency > 5 seconds for 2 minutes Critical
Error rate > 1% for 5 minutes Warning
Error rate > 5% for 2 minutes Critical
Cache eviction rate > 100/minute sustained Warning
DB connection wait > 0 for 5 minutes Warning
JVM heap usage > 85% for 10 minutes Warning

How Skycloak Handles Scaling

Running Keycloak at scale requires deep operational expertise — and that’s exactly what Skycloak provides as a managed hosting platform.

Managed clustering across regions: Skycloak deploys Keycloak in multi-node clusters with automatic failover. Your auth infrastructure stays available even if an entire availability zone goes down.

Auto-scaling based on load: Instead of manually tuning HPA thresholds, Skycloak monitors your traffic patterns and scales nodes up and down automatically. You pay for what you use, not for peak capacity sitting idle.

Optimized Infinispan and database configuration: Every Skycloak deployment ships with cache and database settings tuned for your workload. As your user base grows, we adjust pool sizes, cache owners, and memory allocation without downtime.

Monitoring and alerting built in: Full observability comes standard — no need to set up Prometheus, Grafana, or alerting rules. View real-time metrics in your Skycloak dashboard and get notified before problems impact users.

One-click infrastructure: Need to test your Keycloak setup locally first? Use our Docker Compose Generator to spin up a matching environment in seconds. When you’re ready for production, check out our documentation for deployment guides.

Capacity Planning Checklist

Before going live, work through this checklist to ensure your Keycloak deployment can handle production traffic:

Estimate peak concurrent sessions
Calculate your maximum simultaneous active users. A session typically lasts 30 minutes (SSO session idle timeout), so if you have 10,000 logins per hour with a 30-minute session duration, expect roughly 5,000 concurrent sessions at peak.

Calculate tokens-per-second requirement
Each login generates at least 2 token operations (initial token + ID token). Add token refreshes (typically every 5 minutes per active session). For 5,000 concurrent sessions refreshing every 5 minutes, that’s roughly 17 token operations per second at steady state, plus burst capacity for login spikes.

Size database connections accordingly
Each Keycloak node needs enough connections to handle its share of the load without queuing. Start with max_connections = (peak_logins_per_second / num_nodes) * 2 and adjust based on load test results.

Plan cache memory allocation
Each session consumes roughly 2-5KB in Infinispan. For 100,000 concurrent sessions at 5KB each, you need approximately 500MB of cache memory distributed across the cluster. Add headroom for authentication sessions and action tokens.

Set up monitoring before going live
Don’t wait for an incident to set up dashboards. Deploy monitoring alongside your initial Keycloak setup so you have baseline metrics to compare against.

Load test at 2x expected peak
If you expect 1,000 concurrent users at peak, load test at 2,000. Real-world traffic patterns are spikier than projections, and you want headroom. Run sustained load tests for at least 30 minutes to surface memory leaks and connection pool exhaustion.

Document your scaling runbook
Write down the exact steps to add capacity: how to add nodes, how to increase database pool size, how to expand cache memory. When you’re paged at 2 AM, you don’t want to be figuring this out from scratch.

Conclusion

Scaling Keycloak for production isn’t a single configuration change — it’s a combination of architecture decisions, cache tuning, database optimization, and continuous testing. The good news is that Keycloak’s architecture is designed for horizontal scaling, and with the right configuration, it can handle millions of authentications per day.

Start with the fundamentals: run at least 3 nodes, set cache owners to 2, tune your database connection pool, and load test before launch. Then build out monitoring so you can see problems forming before they impact users.

If managing Keycloak infrastructure isn’t where you want to spend your engineering time, Skycloak handles all of this for you — clustering, caching, scaling, monitoring, and updates — so you can focus on building your application.

Get started with Skycloak and let us handle the infrastructure.

Guilliano Molaire
Written by Guilliano Molaire Founder

Guilliano is the founder of Skycloak and a cloud infrastructure specialist with deep expertise in product development and scaling SaaS products. He discovered Keycloak while consulting on enterprise IAM and built Skycloak to make managed Keycloak accessible to teams of every size.

Ready to simplify your authentication?

Deploy production-ready Keycloak in minutes. Unlimited users, flat pricing, no SSO tax.

© 2026 Skycloak. All Rights Reserved. Design by Yasser Soliman