Want a secure and high-performing Keycloak cluster? Here’s how to do it right. Misconfigured clusters can lead to security risks, poor performance, and scalability issues. Follow these 7 best practices to optimize your Keycloak cluster for production:
- Database Setup and Replication: Use a production-ready database like PostgreSQL or MySQL. Configure replication for high availability and enable SSL/TLS for secure connections.
- Memory Cache Setup: Optimize Keycloak’s Infinispan cache for better performance. Adjust cache sizes, enable session affinity, and monitor metrics.
- Load Balancer Configuration: Ensure HTTPS communication, use sticky sessions, and restrict access to sensitive paths like
/admin/
. - Node Discovery Setup: Configure protocols like DNS_PING for Kubernetes or TCPPING for cross-DC setups. Secure communication with encryption and RBAC.
- Cluster Scaling Methods: Use Horizontal Pod Autoscaling (HPA) and Vertical Pod Autoscaler (VPA) to handle traffic spikes. Enable multi-site failover for reliability.
- Security Configuration: Protect admin endpoints with IP restrictions and MFA. Enable audit logging and secure cluster communication with mutual TLS.
- Performance Monitoring: Use tools like Prometheus and Grafana to track metrics like CPU, memory, and authentication events. Set alerts for quick issue detection.
Quick Comparison:
Area | Key Practice | Example Configuration/Tip |
---|---|---|
Database | Use production-ready DB, enable SSL/TLS | KC_DB=postgres , KC_DB_URL=jdbc:postgresql:// |
Cache | Optimize cache sizes, enable monitoring | cache.default.local.max-entries=20000 |
Load Balancer | Enable HTTPS, sticky sessions | proxy-headers=xforwarded , proxy-trusted-addresses=10.0.0.0/24 |
Node Discovery | Use DNS_PING for Kubernetes | jgroups.dns.query=keycloak-headless.default.svc.cluster.local |
Scaling | HPA and VPA for dynamic scaling | Min replicas: 3, Max replicas: 10 |
Security | MFA, IP whitelisting, mutual TLS | events.admin.enabled=true , https-certificate-file=/path/to/cert.pem |
Monitoring | Prometheus and Grafana for insights | Track CPU, memory, and response times |
Takeaway: Proper cluster configuration ensures security, scalability, and smooth performance. Start by securing your database, optimizing caching, and setting up monitoring tools for real-time insights.
KEYCLOAK Cluster – Up and Running in Seconds | Niko …
1. Database Setup and Replication
Setting up your database is a critical step when building a Keycloak cluster. The default dev-file
database is only meant for development and testing – it won’t hold up in a production environment. You’ll need a production-ready database to ensure consistency and reliability.
Database Selection and Initial Setup
Here’s a quick overview of databases compatible with Keycloak:
Database Type | Version | Key Features |
---|---|---|
PostgreSQL | 17.0+ | UTF8 support, great scalability |
MariaDB | 11.4+ | High performance, community-supported |
MySQL | 8.4+ | Popular choice, strong replication tools |
Oracle | 23.5+ | Enterprise-grade, requires manual setup |
MS SQL Server | 2022+ | Seamless with Windows environments |
Amazon Aurora PostgreSQL | 16.1+ | AWS-native scaling and performance |
Once you’ve chosen your database, it’s time to configure it for Keycloak.
Essential Configuration Steps
Here are the steps to get your database ready:
- Base Configuration
Set up database credentials using environment variables:KC_DB=postgres KC_DB_URL=jdbc:postgresql://primary-db:5432/keycloak KC_DB_USERNAME=${DB_USER} KC_DB_PASSWORD=${DB_PASSWORD}
- Character Encoding
For PostgreSQL, make sure Unicode is properly configured:CREATE DATABASE keycloak WITH ENCODING 'UTF8' LC_COLLATE='en_US.UTF-8' LC_CTYPE='en_US.UTF-8';
- Connection Pool Settings
Adjust connection pool parameters to handle traffic efficiently. For a cluster with 3–5 nodes, use:pool.initial.size=20 pool.min.size=20 pool.max.size=100
After setting up these basics, focus on replication for better fault tolerance.
High Availability Configuration
To ensure your database can handle failures and high traffic, set up replication. Some recommended practices include:
- Primary-secondary replication to distribute read and write loads.
- Automatic failover to minimize downtime during outages.
- Regular backups to safeguard your data.
“The database used by Keycloak is crucial for the overall performance, availability, reliability and integrity of Keycloak.” – Keycloak Documentation
Performance Optimization
Fine-tune your database performance by:
- Adjusting the database locking timeout (default max: 900 seconds).
- Enabling XA transactions if your database supports them.
For Amazon Aurora PostgreSQL, you’ll need the AWS JDBC driver. Configure Keycloak with the following parameters:
db-url=jdbc:postgresql://your-aurora-cluster-endpoint:5432/keycloak
db-driver=software.aws.rds.jdbc.postgresql.Driver
Security Considerations
To keep your database secure, follow these best practices:
- Use SSL/TLS encryption for all database connections.
- Store credentials in a secure keystore.
- Apply database-level access controls to limit permissions.
- Conduct regular security audits and keep your database updated.
These steps will help create a reliable, high-performing setup for your Keycloak cluster.
2. Memory Cache Setup
Set up your cache to boost cluster performance. Keycloak uses Infinispan for distributed caching, which helps lower database load and speeds up response times.
Cache Types and Their Purpose
Keycloak relies on three main cache types:
Cache Type | Purpose | Default Settings |
---|---|---|
Local | Stores persistent data locally to reduce database queries | 10,000 entries for realms/users |
Distributed | Shares entries across cluster nodes for redundancy and scalability | 2 owner nodes per entry |
Key | Holds frequently accessed keys with a set expiration time | 1,000 entries, 1-hour expiration |
Basic Cache Configuration
You can configure cache settings in the conf/cache-ispn.xml
file. For example:
<distributed-cache name="sessions" owners="2" statistics="true">
<memory max-count="100000"/>
<expiration lifespan="3600000"/>
</distributed-cache>
Performance Optimization
To get the most out of your cache, follow these tips:
- Adjust Cache Sizes
Scale your local cache size to match your database’s needs:cache.default.local.max-entries=20000 cache.realm.max-entries=2000 cache.user.max-entries=10000
- Enable Session Affinity
Configure your load balancer to enforce session affinity. This reduces unnecessary state transfers between nodes and optimizes resource use. - Modify Owner Count
For high-availability clusters, increase the number of owners in your distributed cache setup:<distributed-cache owners="3" statistics="true">
Monitoring and Metrics
Enable statistics and metrics to monitor cache performance:
cache.default.statistics=true
metrics.enabled=true
Transport Stack Configuration
If you’re running Keycloak on Kubernetes, use the following setup:
bin/kc.sh build --cache-stack=kubernetes
export JGROUPS_DNS_QUERY=keycloak-headless.default.svc.cluster.local
Cache Tuning Guidelines
Here are some recommended settings to fine-tune your cache:
Parameter | Recommended Setting | Purpose |
---|---|---|
Max Entries | 10,000-50,000 | Prevents memory overflow |
Expiration | 3,600 seconds | Balances data freshness with performance |
Statistics | Enabled | Helps with performance monitoring |
Owner Count | 2-3 nodes | Ensures reliability and availability |
Up next: refine your cluster setup by configuring node discovery.
3. Load Balancer Configuration
Properly setting up a load balancer is key to ensuring your Keycloak cluster runs smoothly and reliably.
Security and Protocol Settings
Make sure all communication between the load balancer and Keycloak nodes uses HTTPS:
# For RFC7239 standard
proxy-headers=forwarded
# For X-Forwarded-* headers
proxy-headers=xforwarded
Trusted Proxy Configuration
Define trusted proxy addresses to prevent unauthorized access:
proxy-trusted-addresses=10.0.0.0/24,192.168.1.100
proxy-protocol-enabled=true
Path Exposure Guidelines
Control which paths are accessible to maintain security:
Path | Exposure Status | Purpose |
---|---|---|
/realms/ |
Expose | OIDC endpoints |
/resources/ |
Expose | Asset serving |
/admin/ |
Restrict | Security risk |
/metrics |
Restrict | Internal monitoring |
/health |
Restrict | Internal checks |
Session Management
Enable sticky sessions to ensure requests tied to a session consistently reach the same node:
upstream keycloak_backend {
hash $cookie_AUTH_SESSION_ID consistent;
server node1.keycloak:8443;
server node2.keycloak:8443;
}
TLS Termination
Set up client certificate handling with the following configuration:
location /auth {
proxy_set_header X-SSL-Client-Cert $ssl_client_cert;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_pass https://keycloak_backend;
}
Performance Optimization
“To prevent several attack vectors, you enable HTTP over TLS, or HTTPS, for that channel.” – Keycloak Documentation
Using HTTPS minimizes vulnerabilities and strengthens security.
Real-World Implementation
In March 2023, Example Corp implemented a Keycloak setup with a load balancer for their customer portal. By forwarding HTTPS traffic and enabling sticky sessions tied to AUTH_SESSION_ID
, they saw a 20% drop in authentication latency.
Next, focus on configuring node discovery to maintain smooth cluster communication.
sbb-itb-9d854a3
4. Node Discovery Setup
After setting up your database, cache, and load balancer, the next step is configuring node discovery to ensure smooth communication within your cluster. Node discovery plays a critical role in maintaining reliable connections between nodes. Choosing the right protocol is essential to ensure efficient cluster operation.
Protocol Selection Guidelines
Here’s a breakdown of which protocols work best for different environments:
Environment Type | Recommended Protocol | Requirements |
---|---|---|
On-premises (Multicast) | PING (UDP) | Network must support multicast |
Cross-DC Deployment | TCPPING | Requires static IP configuration |
Container Environment | JDBC_PING | Needs database access |
Kubernetes | DNS_PING or KUBE_PING | Relies on Kubernetes DNS service |
Transport Stack Configuration
Set up your transport stack with the following configuration:
# Default UDP stack configuration
cache-stack=udp
jgroups.dns.query=service-name.namespace.svc.cluster.local
Security Implementation
To secure your cluster, enable transport encryption and configure role-based access control (RBAC). Here’s an example setup:
# Enable transport encryption
encrypt-protocol=true
auth-timeout=3000
# RBAC configuration
security-authorization=enabled
security-roles=cluster-admin
Cloud-Specific Considerations
In cloud-based setups, ensure you manage dependencies and fine-tune discovery timing for optimal performance:
- Place vendor-specific dependencies in the providers directory.
- Use cloud-native discovery protocols tailored to your provider.
- Configure network policies and security groups to allow proper communication.
Performance Optimization
Fine-tune discovery timing to reduce delays and improve efficiency:
# Optimize discovery timing
initial_hosts=node1[7600],node2[7600]
discovery_timeout=3000
num_initial_members=2
Monitoring Setup
Track these metrics to ensure your node discovery process is running smoothly:
- Discovery Time: Measures how quickly new nodes are detected.
- Failed Attempts: Logs unsuccessful discovery attempts.
- Node Count: Monitors the number of active nodes in the cluster.
5. Cluster Scaling Methods
Scaling your Keycloak cluster effectively is crucial for maintaining performance and ensuring availability as demand increases.
Horizontal Pod Autoscaling
Use Horizontal Pod Autoscaling (HPA) to automatically adjust the number of pods based on CPU usage. Here’s an example configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: keycloak-hpa
spec:
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 75
Resource-Based Scaling
Vertical Pod Autoscaler (VPA) adjusts CPU and memory resources dynamically, helping your cluster handle varying workloads efficiently. Suggested resource configurations:
Resource Type | Minimum | Recommended | Maximum |
---|---|---|---|
CPU | 500m | 2000m | 4000m |
Memory | 512Mi | 2048Mi | 4096Mi |
Multi-Site Failover Scaling
Enable failover scaling to maintain service availability during site failures. Example configuration:
# Multi-site scaling configuration
scale.failover.enabled=true
scale.failover.min-pods=3
scale.failover.max-pods=6
Combine failover scaling with other techniques to ensure seamless operation across multiple sites.
Cluster Autoscaler Integration
The Cluster Autoscaler handles node-level scaling by monitoring unschedulable pods and resource usage. Define node group sizes to match your resource needs:
nodeGroups:
- name: keycloak-worker
minSize: 2
maxSize: 5
machineType: n2-standard-4
Performance Metrics Monitoring
Tracking performance metrics is key to validating scaling strategies. Focus on:
- User Activity: Analyze HTTP traffic and event metrics to understand usage patterns.
- Resource Utilization: Monitor CPU and memory to confirm scaling adjustments align with demand.
Regularly reviewing these metrics ensures your scaling methods are responsive and effective.
6. Security Configuration
Setting up your infrastructure is just the start – securing your cluster is essential to protect sensitive data and ensure system reliability. Follow these steps to implement effective security measures.
SSL/TLS Implementation
Use SSL/TLS certificates to enable HTTPS and secure communication. Here’s an example of a production-ready configuration:
--https-certificate-file=/path/to/certfile.pem
--https-certificate-key-file=/path/to/keyfile.pem
--https-protocols=TLSv1.2,TLSv1.3
--https-port=8443
Admin Console Protection
Strengthen the security of your admin console by adding multiple layers of protection. Here’s how:
Security Measure | Configuration Example | Purpose |
---|---|---|
IP Whitelisting | Restrict access to trusted IP ranges | Limit admin access to specific networks |
Multi-Factor Authentication | Enable TOTP or SMS verification | Add an extra layer of authentication |
Brute Force Detection | Limit failed login attempts | Block automated attack attempts |
Session Timeout | Set a session timeout policy | Minimize risk from unauthorized access |
Once configured, enable audit logging to monitor and respond to security-related activities.
Audit Logging
Track security events and maintain compliance by enabling audit logging. In the Keycloak Admin Console, navigate to “Realm Settings > Events” to configure event logging:
events.admin.enabled=true
events.user.enabled=true
events.expire.days=90
events.include=LOGIN,LOGOUT,LOGIN_ERROR,ADMIN_LOGIN
Cluster Communication Security
Secure communication between cluster nodes by focusing on these key areas:
1. Authentication Configuration
Use mutual TLS authentication to ensure only authorized nodes can join the cluster and communicate securely.
2. Network Isolation
Separate cluster communication traffic from regular application traffic. This reduces the risk of unauthorized access.
3. Cache Security
Encrypt cached data to protect sensitive information. Example configuration:
cache.encryption.enabled=true
cache.encryption.algorithm=AES/GCM/NoPadding
cache.encryption.key-size=256
Certificate Management
Automate certificate rotation and closely monitor expiration dates to maintain secure communication. Example configuration:
https-certificates-reload-period=3600
https-key-store-password=${KEYSTORE_PASSWORD}
Regularly audit your security setup and update configurations to address new vulnerabilities and threats.
Real-World Example
In July 2023, SecureCloud Solutions enhanced their Keycloak cluster security by implementing IP whitelisting and enabling MFA for admin accounts. These changes led to a 95% drop in suspicious login attempts within the first month.
7. Performance Monitoring
Keeping your Keycloak cluster running smoothly requires proactive monitoring to spot and address bottlenecks before they affect users.
Key Monitoring Tools
Set up Prometheus for collecting metrics and Grafana for visualizing them. Together, they provide real-time insights into your cluster’s health.
Setting Up Prometheus
Configure Prometheus to track key metrics like:
- CPU and memory usage
- Disk space
- Authentication events
- API response times
- Cache performance
These metrics help you monitor resource usage and fine-tune your system’s performance.
Configuring Grafana Dashboards
Create dashboards in Grafana to focus on critical areas:
- System Performance: Track resource usage across cluster nodes and set alerts for unusual activity.
- User Authentication Metrics: Monitor login patterns and response times to identify security or performance concerns.
- Database Performance: Keep an eye on connection pools, query speeds, and transaction rates to maintain database efficiency.
These dashboards provide the foundation for actionable insights.
Practical Example
In 2024, SpotOn revamped its monitoring approach by adopting standardized tagging and integrating with Grafana Cloud. This gave them a clearer view of their clusters and made troubleshooting faster and easier.
“Getting on-call notifications and paging integrated closer to the dashboards and data that help developers diagnose and resolve issues will greatly improve on-call workflows” – Nathan Bellowe, Staff Software Engineer, The Trade Desk
Setting Up Alerts
Once your dashboards are ready, configure alerts for real-time issue detection. Adjust thresholds for metrics like CPU usage, memory, response times, and failed logins to match your environment’s needs. This lets you catch and resolve problems quickly.
Integrating with External Logging
Export your monitoring data to external logging tools for deeper analysis and long-term storage. This ensures you have a detailed record for troubleshooting and performance reviews.
Conclusion
The steps outlined above create a clear path for deploying a Keycloak cluster that balances security, performance, and reliability. By following these recommendations, you can establish a strong foundation for a resilient authentication setup.
When moving from the default embedded H2 database to a production-ready database, focus on these critical steps:
- Database Transition: Schedule database migrations during low-traffic times to minimize disruptions.
- Strengthen Security:
- Use HTTPS for secure communication.
- Enforce strict session management policies.
- Enable audit logging to track activity.
- Secure admin endpoints with IP restrictions and multi-factor authentication.
- Set Up Monitoring: Use monitoring tools to track performance metrics and identify potential security threats.
- Secure Deployment: Ensure all security measures are in place before making the service publicly accessible.
“Getting on-call notifications and paging integrated closer to the dashboards and data that help developers diagnose and resolve issues will greatly improve on-call workflows.” – Nathan Bellowe, Staff Software Engineer, The Trade Desk