In today’s digital landscape, secure identity management is more critical than ever. Keycloak, an open-source identity and access management solution, offers a robust platform for authentication and authorization. Deploying Keycloak in a highly available manner ensures that your applications remain secure and accessible, even during peak loads or unexpected failures.
This comprehensive guide explores best practices for deploying a highly available Keycloak cluster on Kubernetes. We’ll delve into the importance of inter-node latency, the role of Infinispan, and why StatefulSets are preferable over Deployments. Plus, we’ll share insights from our experience at Skycloak, where we ensure our clusters are always spread across different zones for maximum availability.
So grab your favorite beverage (coffee, tea, or perhaps some Kubernetes juice), and let’s get started!
Understanding High Availability in Keycloak
What Does High Availability Mean?
High availability (HA) refers to a system’s ability to operate continuously without failure for a long period. In the context of Keycloak, HA ensures that authentication and authorization services are always up and running, providing a seamless experience for users.
Why Is HA Important for Keycloak?
- User Experience: Downtime can frustrate users and erode trust.
- Business Continuity: Essential services rely on continuous authentication.
- Security: Consistent access control prevents unauthorized access during failovers.
Best Practices for Kubernetes Cluster Deployment
Deploying a Kubernetes cluster for high availability involves careful planning and configuration. Here are some best practices:
1. Spread Pods Across Different Zones
To enhance availability, spread your Keycloak pods across different availability zones. This way, if one zone goes down, your service remains operational.
- Use Zone Affinity: Configure your pods to prefer different zones.
- At Skycloak: We always ensure our clusters span multiple zones to mitigate the risk of zone-specific failures.
2. Implement Pod Anti-Affinity Rules
Ensure that pods don’t end up on the same node or even in the same zone.
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: keycloak-example
topologyKey: topology.kubernetes.io/zone
3. Use Multiple Master Nodes
Prevent a single point of failure in the Kubernetes control plane by using multiple master nodes.
4. Employ Horizontal Pod Autoscalers (HPA)
Automatically scale your Keycloak pods based on metrics like CPU utilization to handle varying loads.
5. Monitor and Log Extensively
Use tools like Prometheus and Grafana to monitor cluster health and performance.
Inter-Node Latency and Its Impact on Keycloak Performance
Latency between nodes can significantly affect Keycloak’s performance, especially in a distributed setup.
Understanding Inter-Node Latency
Inter-node latency is the time it takes for data to travel between nodes in your cluster. High latency can lead to:
- Slow Authentication: Increased time for login requests.
- Session Inconsistency: Delays in session replication across nodes.
- Higher Error Rates: Timeouts and failed requests.
I.e. Using data from CloudPing, the latency within the us-east-2 region is approximately 5.17 ms.
Impact on Keycloak Performance
Even small increases in latency can affect Keycloak’s responsiveness.
Example Performance Impact
Latency (ms) | Authentication Time (ms) | Error Rate (%) |
---|---|---|
5 | 50 | 0.1 |
50 | 150 | 0.5 |
150 | 500 | 2.0 |
Explanation:
• Latency (ms): Represents the network latency between nodes.
• Authentication Time (ms): Illustrative average time it might take to authenticate a user under the given latency.
• Error Rate (%): An estimated error rate due to timeouts or failures caused by increased latency.
Since these figures are illustrative, the data serves as an example to demonstrate the potential impact of latency on Keycloak performance. Actual performance metrics can vary based on numerous factors, including:
- Network infrastructure
- Hardware specifications
- Cluster configuration
- Workload characteristics
For Accurate Data:
- Conduct Performance Tests: Perform benchmarking and load testing in your specific environment to gather real data.
- Refer to Official Documentation: Check Keycloak’s official documentation for any performance-related insights.
- Academic and Industry Studies: Look for research papers or case studies that analyze the impact of network latency on distributed authentication systems like Keycloak.
Best Practices
- Zone Distribution: Spread pods across zones with low latency.
- Monitor Latency: Regularly check inter-node latency and adjust configurations as needed.
- Optimize Network: Use high-speed network interfaces and efficient routing.
Leveraging Infinispan for High Availability
Infinispan is an in-memory data grid used by Keycloak for caching and clustering.
Embedded vs. Dedicated Infinispan
Embedded Infinispan
- Pros: Simpler setup, fewer components.
- Cons: Less scalable, shares resources with Keycloak instances.
Dedicated Infinispan Cluster
- Pros: Better performance, independent scaling.
- Cons: More complex setup.
When to Use Dedicated Infinispan
- High Traffic: Large numbers of authentication requests.
- Distributed Clusters: Nodes spread across multiple zones or regions.
- Resource Optimization: Offload caching to dedicated resources.
Configuring Infinispan with Kubernetes
- Deploy Infinispan Operator: Manage the lifecycle of your Infinispan cluster.
- Configure Keycloak: Point Keycloak to use the external Infinispan cluster.
- Service Discovery: Ensure Keycloak pods can communicate with Infinispan pods.
StatefulSets vs. Deployments: Choosing the Right Controller
Why Use StatefulSets for Keycloak?
StatefulSets provide:
- Stable Network Identities: Essential for cluster communication.
- Ordered Deployment: Pods are started and updated in order.
- Persistent Storage: Maintains state across restarts.
Limitations of Deployments
- Ephemeral Pods: Pods can be replaced without preserving identity.
- Stateless Design: Not ideal for applications requiring stable identities.
Example StatefulSet Configuration with Headless Service
Below is an example of a StatefulSet configuration for Keycloak with a headless service. This setup ensures that pods are spread across different zones and can discover each other.
StatefulSet Configuration
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: keycloak-example
namespace: keycloak
labels:
id: "example"
spec:
replicas: 3
selector:
matchLabels:
id: "example"
serviceName: keycloak-example-headless
template:
metadata:
labels:
id: "example"
spec:
containers:
- name: keycloak
image: quay.io/keycloak/keycloak:21.1.2
args:
- >-
-Djgroups.dns.query=keycloak-example-headless.keycloak.svc.cluster.local
- '--verbose'
- start
- '--optimized'
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: infinispan
containerPort: 7800
protocol: TCP
env:
- name: KC_HOSTNAME
value: example.keycloak.skycloak.io
- name: KC_CACHE
value: ispn
- name: KC_CACHE_STACK
value: kubernetes
# Additional environment variables...
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 500m
memory: 1Gi
livenessProbe:
httpGet:
path: /health/live
port: 8080
readinessProbe:
httpGet:
path: /health/ready
port: 8080
startupProbe:
httpGet:
path: /health/started
port: 8080
imagePullPolicy: Always
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/instance: keycloak-example
topologyKey: topology.kubernetes.io/zone
updateStrategy:
type: RollingUpdate
Headless Service Configuration
apiVersion: v1
kind: Service
metadata:
name: keycloak-example-headless
namespace: keycloak
labels:
id: "example"
spec:
ports:
- name: http
protocol: TCP
port: 8080
targetPort: 8080
selector:
id: "example"
clusterIP: None
Explanation
- Headless Service: Allows Keycloak pods to discover each other via DNS.
- Pod Anti-Affinity: Spreads pods across different zones.
- Environment Variables: Configures Keycloak settings, including cache and hostname.
- Probes: Liveness, readiness, and startup probes ensure that the pod is running correctly.
Why This Configuration Works
- Zone Distribution: Pods are scheduled across zones, enhancing availability.
- Stable Network Identities: StatefulSet provides consistent identities.
- Service Discovery: Headless service enables pods to find each other for clustering.
Conclusion
Deploying a highly available Keycloak cluster on Kubernetes requires careful consideration of factors like inter-node latency, pod distribution, and stateful configurations. By spreading pods across different zones, you minimize the risk of downtime due to zone failures.
At Skycloak, we’ve implemented these best practices to ensure our clusters are resilient and performant. By using StatefulSets with pod anti-affinity rules and leveraging Infinispan effectively, you can achieve a robust Keycloak deployment that scales with your needs.
Key Takeaways:
- Spread Pods Across Zones: Enhance availability by mitigating zone-specific failures.
- Monitor Latency: Keep an eye on inter-node latency to ensure optimal performance.
- Use StatefulSets: Benefit from stable identities and ordered deployments.
- Leverage Infinispan: Choose between embedded and dedicated setups based on your requirements.