Want faster Keycloak authentication? Here’s how caching makes it happen.
Keycloak uses distributed caching, powered by Infinispan, to store user sessions and authentication data in memory. This reduces database load, speeds up logins, and ensures session data is available across all server nodes.
Key Highlights:
- Why it matters: Caching minimizes database queries, cutting response times to milliseconds.
- How it works: Stores sessions in memory and synchronizes them across nodes in a cluster.
- Setup essentials: Requires adequate memory, CPU, and network configuration.
- Key steps:
- Enable caching with
--cache=ispn
. - Modify
cache-ispn.xml
for custom settings. - Use JGroups for cluster node discovery.
- Enable caching with
- Optimization tips: Adjust memory limits, set cache owners to at least 2, and secure data with TLS.
By following these steps, you’ll enhance Keycloak’s performance, especially in high-traffic environments. Read on for a detailed guide.
How Keycloak Caching Works and Why It Matters
Keycloak caching plays a crucial role in delivering fast authentication. Its underlying technology directly influences how quickly users can log in and access applications.
Keycloak’s Default Caching System
Keycloak’s caching system is powered by Infinispan, which stores frequently accessed data in memory instead of repeatedly fetching it from the database. When running in production mode, caching is enabled by default with practical configurations. Each node uses local caches to store data, reducing unnecessary database queries. For example, caches like Realms, Users, and Authorization can hold up to 10,000 entries, while the Keys cache is limited to 1,000 entries with a one-hour expiration.
This setup is effective for single-node deployments but requires adjustments to handle multi-node environments efficiently.
How Distributed Caching Works in Multi-Node Deployments
In multi-node setups, keeping all nodes synchronized with the latest authentication data is essential. Distributed caching solves this by replicating cache entries across nodes, ensuring each entry is stored on two nodes by default. If a node needs data it doesn’t own, it fetches it from one of the owner nodes. Additionally, a “work” cache is employed to broadcast invalidation messages. When one node updates shared database data, it notifies other nodes to invalidate their outdated local caches.
However, if both owner nodes for a specific cache entry go offline, the cached data for that entry is lost. For development environments, Keycloak simplifies things by disabling distributed caching when using the start-dev
command, relying solely on local caches.
These strategies not only ensure data consistency but also significantly enhance performance.
Benefits of Optimized Caching
Optimizing Keycloak’s caching system ensures fast authentication, even in clustered environments. Distributed caching helps reduce latency and database contention by serving frequently requested data directly from memory. This is especially beneficial during high-traffic periods. It also ensures that session data is accessible across all nodes in the cluster, preventing users from having to re-authenticate when their requests are routed to different servers.
Additionally, optimized caching supports horizontal scaling. When new nodes are added to the cluster, they can immediately access cached data, improving availability without compromising performance. These advantages make Keycloak’s caching system a key component of its scalability and reliability.
Requirements Before Setting Up Distributed Caching
Before diving into distributed caching, it’s crucial to ensure your infrastructure, configuration files, and cluster discovery mechanisms are set up correctly. These steps lay the groundwork for a caching system that delivers the speed and reliability needed for high-demand authentication scenarios.
Hardware and Network Requirements
To run distributed caching smoothly, allocate adequate resources for each Keycloak pod. Caching operations depend heavily on RAM. For instance, each Keycloak pod requires 1,250 MB of memory for realm data and 10,000 cached sessions as a baseline. In containerized environments, Keycloak uses 70% of the memory limit for heap-based memory and approximately 300 MB for non-heap memory.
When it comes to CPU allocation, here’s a general guideline:
- 1 vCPU for every 15 logins per second
- 1 vCPU per 120 client credential grants
- 1 vCPU per 120 refresh token requests
Additionally, factor in 150% extra headroom to handle traffic spikes.
On the network side, ensure smooth inter-node communication by configuring Keycloak to open port 7800 for unicast transmission and port 57800 for failure detection. The network port must bind to an interface accessible by all cluster nodes. For instances running across different networks, set up port forwarding and configure the jgroups.external_port
and jgroups.external_addr
properties as needed.
Lastly, use an external shared database to handle cache invalidation and maintain data consistency across nodes. Once hardware and network parameters are defined, update your configuration files to reflect these settings.
Important Configuration Files and Directories
Distributed caching in Keycloak is primarily managed through the conf/cache-ispn.xml
file. This file contains default cache settings for realms, users, and authorization data. Before making changes, review the default configurations carefully. For custom setups, create a modified version of cache-ispn.xml
and reference it using the --cache-config-file
option.
If your deployment requires custom transport stacks, update the same file to ensure compatibility with your environment. Afterward, configure your cluster discovery settings to finalize the setup.
Cluster Node Discovery Setup
Keycloak relies on JGroups for node discovery and cluster communication. The discovery mechanism you choose depends on your deployment environment. By default, Keycloak uses the JDBC_PING transport stack, which works with your shared database to coordinate node discovery. JDBC_PING simplifies the setup since it only requires configuration for the current instance, unlike TCPPING, which demands the IP and port details of all instances.
For Kubernetes deployments, KUBE_PING is a popular choice. It integrates with Kubernetes’ native service discovery. Starting with Keycloak 17, DNS_PING can also be used with a headless service by setting KC_CACHE_STACK=kubernetes
and adding -Djgroups.dns.query=<name-of-headless-service>
to JAVA_OPTS. In cloud environments, DNS_PING uses DNS queries to discover cluster members, while traditional bare-metal or VM setups can opt for TCPPING or JDBC_PING.
Make sure to configure the JGroups bind address to an interface that all cluster nodes can access. If needed, override the default address using the jgroups.bind.address
property. Proper node discovery not only reduces lookup times but also improves overall cluster performance.
For clusters handling more than 2,500 concurrent clients, adjust the cache sizes accordingly. Increase the users cache size to twice the number of concurrent clients and the realms cache size to four times that number. Regular load testing and monitoring will help fine-tune these parameters for optimal performance.
How to Configure Distributed Caching Step by Step
This guide walks you through enabling caching, updating the configuration file, and verifying the cluster’s status.
Turn On Distributed Caching in Keycloak
To activate distributed caching, use the appropriate startup command. For production environments, run:
bin/kc.sh start --cache=ispn
This command starts Keycloak in production mode with distributed Infinispan caching enabled, allowing all Keycloak nodes in your network to recognize each other.
Important: The start-dev
command disables distributed caching.
For Kubernetes deployments, you also need to specify the relevant cache stack. For example:
bin/kc.sh start --cache=ispn --cache-stack=kubernetes
Additionally, set the JAVA_OPTS_APPEND
environment variable to include the JGroups DNS query parameter, which points to your headless service:
JAVA_OPTS_APPEND="-Djgroups.dns.query=keycloak-discovery.keycloak.svc.cluster.local"
Once caching is enabled, proceed to customize the cache settings in the configuration file.
Edit the Cache Configuration File
To fine-tune distributed caching, modify the conf/cache-ispn.xml
file.
One essential adjustment is the number of owners for distributed caches. By default, cache entries are replicated to a subset of nodes. To change this, update the owners
attribute. For instance, to configure the session cache:
<distributed-cache name="sessions" owners="2"> <expiration lifespan="-1"/> </distributed-cache>
For high-availability setups, set the owners
attribute to 2 or more for both the sessions
and clientSessions
caches. This ensures that session data remains accessible even if one node fails.
In environments with frequent user sessions, consider removing memory limits by omitting the <memory max-count="..."/>
element. Adjust the ownership settings to match your needs.
If your deployment requires custom transport configurations, define them in the cache configuration file. Here’s an example of a custom encrypted transport stack:
<jgroups> <stack name="my-encrypt-udp" extends="udp"> <SSL_KEY_EXCHANGE keystore_name="server.jks" keystore_password="password" stack.combine="INSERT_AFTER" stack.position="VERIFY_SUSPECT2"/> <ASYM_ENCRYPT asym_keylength="2048" asym_algorithm="RSA" change_key_on_coord_leave="false" change_key_on_leave="false" use_external_key_exchange="true" stack.combine="INSERT_BEFORE" stack.position="pbcast.NAKACK2"/> </stack> </jgroups> <cache-container name="keycloak"> <transport lock-timeout="60000" stack="my-encrypt-udp"/> ... </cache-container>
If you define a custom transport stack, avoid using the --cache-stack
option in the startup command, as it overrides the settings in the configuration file.
To use a completely custom cache configuration file, specify it during startup like this:
bin/kc.sh start --cache-config-file=my-cache-file.xml
Check Cache Status and Fix Common Issues
Once caching is configured, confirm its functionality. Look for log messages indicating successful cluster discovery and communication, such as:
ISPN000094: Received new cluster view for channel ISPN:
ISPN100002: Starting rebalance with members
These entries should list the active cluster members, confirming proper communication between nodes.
For a more detailed status check, use the Infinispan management CLI. SSH into one of your Keycloak servers and run:
echo describe | bin/cli.sh -c localhost:11222 -f -
This assumes the default management port is 11222
. In the output, verify the cluster_members_physical_addresses
field to ensure all cluster members are active.
If configuration changes don’t show up in metrics, rebuild Keycloak instead of redeploying.
To monitor cache performance over time, enable Keycloak’s metrics feature. This provides insights into cache hit rates, cluster membership, and replication status, helping you address potential issues before they affect users.
Best Practices for Advanced Cache Optimization
Once you’ve set up distributed caching, there are additional steps you can take to fine-tune performance and bolster security. Let’s look at how you can optimize your configuration to get the most out of your caching system.
Optimize Memory and Ownership Settings
Striking the right balance between database load and memory usage is key to improving efficiency.
Memory Limits for Local Caches
Adjust the maximum number of cache entries to align with your database size. For instance, if your database holds 50,000 users, configure your user cache to accommodate at least 50,000 entries. This reduces the need for repeated database queries, saving time and resources.
Session Cache Memory Management
By default, user and client session caches are capped at 10,000 entries per node. In high-traffic environments, you can remove this limit by excluding the <memory max-count="..."/>
element in your sessions
and clientSessions
cache configurations. This allows unlimited session storage in memory.
If you need to limit offline sessions, you can set a cap of 1,000 entries using the following startup parameter:
--cache-embedded-offline-sessions-max-count=1000
Ownership Configuration
The default owner settings ensure fault tolerance, but you can adjust them for better availability. For critical systems, set the owners
parameter to three or more in your conf/cache-ispn.xml
file:
<distributed-cache name="sessions" owners="3"> <expiration lifespan="-1"/> </distributed-cache>
Volatile User Sessions Strategy
To reduce database dependency, you can store user sessions entirely in the cache. This involves removing the <memory max-count="..."/>
element from session caches, setting the number of owners to at least two, and disabling persistent user sessions with this command:
bin/kc.sh start --features-disabled=persistent-user-sessions
After optimizing memory and ownership settings, focus on securing your cached data with strong protocols.
Security Best Practices for Caching
Securing cached data is critical, especially when dealing with sensitive authentication details. Here’s how you can protect your data both in transit and at rest.
Transport Layer Security
Keycloak uses TLS encryption with a self-signed RSA 2048-bit certificate and supports TLS 1.3. For production environments, replace the default certificate with a trusted one by manually configuring the keystore. You can also manage certificate rotation using the cache-embedded-mtls-rotation-interval-days
option to control renewal frequency.
End-to-End Security
Strengthen security further by enforcing SSL/HTTPS for all client communications to prevent man-in-the-middle attacks. Shorten token lifespans through the timeout settings to minimize exposure if a token is compromised. If a breach occurs, use the not-before revocation policy to immediately invalidate affected tokens.
Network-Level Protection
To enhance security at the network level, define specific redirect URIs to reduce token redirection risks. Additionally, apply Proof Key for Code Exchange (PKCE) to all clients – especially public ones – for an added layer of protection during the authorization code flow.
For those looking to simplify management, managed platforms like Skycloak can handle these complexities.
Using Managed Solutions Like Skycloak
Managing distributed caching across multiple Keycloak nodes requires a mix of technical knowledge and infrastructure management. Skycloak offers a managed platform that simplifies this process while ensuring optimal performance.
Cluster Configuration
Skycloak handles the technical aspects of distributed caching, including node discovery, transport optimization, and memory allocation. It automatically adjusts ownership settings and memory limits based on your cluster size and expected workload.
Enterprise-Grade Monitoring
The platform includes built-in monitoring tools to track cache performance, such as hit rates, replication status, and overall cluster health. These insights allow you to spot and resolve caching issues before they impact user authentication. Advanced monitoring features, including detailed analytics, are available in the Growth or enterprise plan.
Expert Support and Optimization
Skycloak also provides consulting services to help you fine-tune your cache configurations. With the Startup plan, you get one hour of expert support each month, while the Growth plan offers two hours. Their team reviews your setup and suggests adjustments to optimize performance while managing resources effectively.
Scalable Infrastructure
As your authentication needs grow, Skycloak adapts your cache configurations to maintain peak performance. From small clusters on the Dev plan ($25/month) to large, high-availability setups on the Growth plan, Skycloak eliminates the need for deep Infinispan expertise while keeping your caching system secure and efficient.
Conclusion
Distributed caching plays a key role in reducing database strain and speeding up Keycloak authentication processes. By setting up caching to scale horizontally, you can ensure both high availability and better performance for your authentication system.
Replication across multiple nodes adds an extra layer of reliability, allowing services to continue uninterrupted even when individual nodes fail. Fine-tuning memory settings also helps cut down on database queries, which significantly improves overall efficiency. As AWS puts it, “Caching helps applications perform dramatically faster and cost significantly less at scale”.
Scaling is straightforward – simply add server instances to increase throughput until you hit database or caching limits. This scalability allows your authentication infrastructure to grow alongside your business without the need for a complete redesign.
Security is equally important. Implementing TLS encryption ensures cached data remains protected, both during transit and while stored, providing a solid security framework for enterprise authentication.
For those looking to simplify the process, managed services like Skycloak take care of the technical details, offering robust monitoring and expert support. Whether you choose a self-managed setup or a hosted solution, distributed caching is a cornerstone for building authentication systems that are both efficient and dependable for enterprise-level demands.
FAQs
What are the benefits of using distributed caching in Keycloak, and how does it improve authentication speed?
Distributed caching in Keycloak is a game-changer for boosting authentication performance, especially when dealing with heavy traffic. By keeping frequently accessed data closer to users, it cuts down on repeated database queries. The result? Faster response times and a more reliable authentication process.
Tools like Skycloak make setting up distributed caching in Keycloak easier by providing automated configurations and integration tools. This not only speeds things up but also enhances scalability, making it a solid choice for enterprise-level identity and access management systems.
How do I securely set up distributed caching in Keycloak for better performance?
To set up distributed caching securely in Keycloak, the first step is to encrypt cache communication to safeguard sensitive data as it moves between systems. This can be achieved by using transport layer encryption, such as TLS, and ensuring the caching environment is isolated from public networks through proper network segmentation.
Access controls play a crucial role in securing your caching setup. Implement strong authentication and authorization measures like Role-Based Access Control (RBAC) or Access Control Lists (ACLs) to tightly control who can interact with the cache. For production environments, it’s essential to focus on both encryption and network isolation to reduce security vulnerabilities and block unauthorized access to cached data.
What hardware and network setup is needed for efficient distributed caching in Keycloak for high-traffic environments?
To set up distributed caching effectively in a high-traffic Keycloak environment, start by ensuring your hardware is up to the task. Each pod should have at least 1,250 MB of RAM to handle caching for realm data and active sessions. This ensures smooth performance and keeps the system responsive under heavy loads.
From a networking perspective, reliable communication between nodes is essential. Use high-bandwidth, low-latency connections to keep cache synchronization running smoothly and avoid unnecessary delays. Make sure the required communication ports are open for node discovery and cache updates. By default, Keycloak relies on UDP transport for these operations, but you can adjust this setting to meet your specific performance requirements.
With the right hardware and network setup, you can maintain cache consistency and ensure faster authentication even in demanding scenarios.