How to Achieve High Availability for Keycloak on Kubernetes

In today’s digital landscape, secure identity management is more critical than ever. Keycloak, an open-source identity and access management solution, offers a robust platform for authentication and authorization. Deploying Keycloak in a highly available manner ensures that your applications remain secure and accessible, even during peak loads or unexpected failures.

This comprehensive guide explores best practices for deploying a highly available Keycloak cluster on Kubernetes. We’ll delve into the importance of inter-node latency, the role of Infinispan, and why StatefulSets are preferable over Deployments. Plus, we’ll share insights from our experience at Skycloak, where we ensure our clusters are always spread across different zones for maximum availability.

So grab your favorite beverage (coffee, tea, or perhaps some Kubernetes juice), and let’s get started!

Understanding High Availability in Keycloak

What Does High Availability Mean?

High availability (HA) refers to a system’s ability to operate continuously without failure for a long period. In the context of Keycloak, HA ensures that authentication and authorization services are always up and running, providing a seamless experience for users.

Why Is HA Important for Keycloak?

  • User Experience: Downtime can frustrate users and erode trust.
  • Business Continuity: Essential services rely on continuous authentication.
  • Security: Consistent access control prevents unauthorized access during failovers.

Best Practices for Kubernetes Cluster Deployment

Deploying a Kubernetes cluster for high availability involves careful planning and configuration. Here are some best practices:

1. Spread Pods Across Different Zones

To enhance availability, spread your Keycloak pods across different availability zones. This way, if one zone goes down, your service remains operational.

  • Use Zone Affinity: Configure your pods to prefer different zones.
  • At Skycloak: We always ensure our clusters span multiple zones to mitigate the risk of zone-specific failures.

2. Implement Pod Anti-Affinity Rules

Ensure that pods don’t end up on the same node or even in the same zone.

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app.kubernetes.io/instance: keycloak-example
          topologyKey: topology.kubernetes.io/zone
pods on different data centers

3. Use Multiple Master Nodes

Prevent a single point of failure in the Kubernetes control plane by using multiple master nodes.

4. Employ Horizontal Pod Autoscalers (HPA)

Automatically scale your Keycloak pods based on metrics like CPU utilization to handle varying loads.

5. Monitor and Log Extensively

Use tools like Prometheus and Grafana to monitor cluster health and performance.

Inter-Node Latency and Its Impact on Keycloak Performance

Latency between nodes can significantly affect Keycloak’s performance, especially in a distributed setup.

Understanding Inter-Node Latency

Inter-node latency is the time it takes for data to travel between nodes in your cluster. High latency can lead to:

  • Slow Authentication: Increased time for login requests.
  • Session Inconsistency: Delays in session replication across nodes.
  • Higher Error Rates: Timeouts and failed requests.

I.e. Using data from CloudPing, the latency within the us-east-2 region is approximately 5.17 ms.

Impact on Keycloak Performance

Even small increases in latency can affect Keycloak’s responsiveness.

Example Performance Impact

Latency (ms)Authentication Time (ms)Error Rate (%)
5500.1
501500.5
1505002.0
Hypothetical impact of latency on authentication time and error rate

Explanation:

Latency (ms): Represents the network latency between nodes.

Authentication Time (ms): Illustrative average time it might take to authenticate a user under the given latency.

Error Rate (%): An estimated error rate due to timeouts or failures caused by increased latency.

Since these figures are illustrative, the data serves as an example to demonstrate the potential impact of latency on Keycloak performance. Actual performance metrics can vary based on numerous factors, including:

  • Network infrastructure
  • Hardware specifications
  • Cluster configuration
  • Workload characteristics

For Accurate Data:

  • Conduct Performance Tests: Perform benchmarking and load testing in your specific environment to gather real data.
  • Refer to Official Documentation: Check Keycloak’s official documentation for any performance-related insights.
  • Academic and Industry Studies: Look for research papers or case studies that analyze the impact of network latency on distributed authentication systems like Keycloak.

Best Practices

  • Zone Distribution: Spread pods across zones with low latency.
  • Monitor Latency: Regularly check inter-node latency and adjust configurations as needed.
  • Optimize Network: Use high-speed network interfaces and efficient routing.

Leveraging Infinispan for High Availability

Infinispan is an in-memory data grid used by Keycloak for caching and clustering.

Embedded vs. Dedicated Infinispan

Embedded Infinispan

  • Pros: Simpler setup, fewer components.
  • Cons: Less scalable, shares resources with Keycloak instances.

Dedicated Infinispan Cluster

  • Pros: Better performance, independent scaling.
  • Cons: More complex setup.

When to Use Dedicated Infinispan

  • High Traffic: Large numbers of authentication requests.
  • Distributed Clusters: Nodes spread across multiple zones or regions.
  • Resource Optimization: Offload caching to dedicated resources.

Configuring Infinispan with Kubernetes

  1. Deploy Infinispan Operator: Manage the lifecycle of your Infinispan cluster.
  2. Configure Keycloak: Point Keycloak to use the external Infinispan cluster.
  3. Service Discovery: Ensure Keycloak pods can communicate with Infinispan pods.

StatefulSets vs. Deployments: Choosing the Right Controller

Why Use StatefulSets for Keycloak?

StatefulSets provide:

  • Stable Network Identities: Essential for cluster communication.
  • Ordered Deployment: Pods are started and updated in order.
  • Persistent Storage: Maintains state across restarts.

Limitations of Deployments

  • Ephemeral Pods: Pods can be replaced without preserving identity.
  • Stateless Design: Not ideal for applications requiring stable identities.

Example StatefulSet Configuration with Headless Service

Below is an example of a StatefulSet configuration for Keycloak with a headless service. This setup ensures that pods are spread across different zones and can discover each other.

StatefulSet Configuration

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: keycloak-example
  namespace: keycloak
  labels:
    id: "example"
spec:
  replicas: 3
  selector:
    matchLabels:
      id: "example"
  serviceName: keycloak-example-headless
  template:
    metadata:
      labels:
        id: "example"
    spec:
      containers:
        - name: keycloak
          image: quay.io/keycloak/keycloak:21.1.2
          args:
            - >-
              -Djgroups.dns.query=keycloak-example-headless.keycloak.svc.cluster.local
            - '--verbose'
            - start
            - '--optimized'
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
            - name: infinispan
              containerPort: 7800
              protocol: TCP
          env:
            - name: KC_HOSTNAME
              value: example.keycloak.skycloak.io
            - name: KC_CACHE
              value: ispn
            - name: KC_CACHE_STACK
              value: kubernetes
            # Additional environment variables...
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
          startupProbe:
            httpGet:
              path: /health/started
              port: 8080
          imagePullPolicy: Always
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/instance: keycloak-example
                topologyKey: topology.kubernetes.io/zone
  updateStrategy:
    type: RollingUpdate

Headless Service Configuration

apiVersion: v1
kind: Service
metadata:
  name: keycloak-example-headless
  namespace: keycloak
  labels:
    id: "example"
spec:
  ports:
    - name: http
      protocol: TCP
      port: 8080
      targetPort: 8080
  selector:
    id: "example"
  clusterIP: None

Explanation

  • Headless Service: Allows Keycloak pods to discover each other via DNS.
  • Pod Anti-Affinity: Spreads pods across different zones.
  • Environment Variables: Configures Keycloak settings, including cache and hostname.
  • Probes: Liveness, readiness, and startup probes ensure that the pod is running correctly.

Why This Configuration Works

  • Zone Distribution: Pods are scheduled across zones, enhancing availability.
  • Stable Network Identities: StatefulSet provides consistent identities.
  • Service Discovery: Headless service enables pods to find each other for clustering.

Conclusion

Deploying a highly available Keycloak cluster on Kubernetes requires careful consideration of factors like inter-node latency, pod distribution, and stateful configurations. By spreading pods across different zones, you minimize the risk of downtime due to zone failures.

At Skycloak, we’ve implemented these best practices to ensure our clusters are resilient and performant. By using StatefulSets with pod anti-affinity rules and leveraging Infinispan effectively, you can achieve a robust Keycloak deployment that scales with your needs.

Key Takeaways:

  • Spread Pods Across Zones: Enhance availability by mitigating zone-specific failures.
  • Monitor Latency: Keep an eye on inter-node latency to ensure optimal performance.
  • Use StatefulSets: Benefit from stable identities and ordered deployments.
  • Leverage Infinispan: Choose between embedded and dedicated setups based on your requirements.