HA setup in kubernetes

flamingo_as_service · June 7, 2022, 1:02pm

I’m trying to get keycloak to work in a HA mode with multiple replicas however I’m struggling to do so. I would like to use jgroups.dns.query for this. Mind you, I’m using a docker image published by bitnami and istio as my gateway.

Here is my configuration (I omitted env vars that are not related to clustering, as keycloak works with a single replica, but problems arise when I use more than one replica).

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: keycloak
  labels:
    app: keycloak
spec:
  serviceName: "headless-service"
  replicas: 3
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: keycloak
        image: bitnami/keycloak:17
        env:
        - name: KEYCLOAK_CACHE_TYPE
          value: ispn
        - name: KEYCLOAK_CACHE_STACK
          value: kubernetes
        - name: KEYCLOAK_EXTRA_ARGS
          value: "-Djgroups.dns.query=headless-service"
        ports:
          - name: http
            containerPort: 8080
          - name: https
            containerPort: 8443
        readinessProbe:
          httpGet:
            path: /
            port: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: headless-service
  labels:
    app: keycloak
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - port: 8080
    name: http
    targetPort: 8080
  selector:
    app: keycloak

So I’ve tried recreating various examples that I found online and I also looked at bitnami’s keycloak helm chart to get it to work however it seems that the pods are not discovering each other and every pod seems to create it’s own cluster. Here is a snippet from the logs:

2022-06-07 12:37:10,263 INFO  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) keycloak-1-367: no members discovered after 2003 ms: creating cluster as coordinator

And this repeats for keycloak-0 and keycloak-2. I can’t access keycloak web UI and I’m getting the error Your login attempt timed out. Login will start from the beginning.

What am I doing incorrectly here?

dnulnets · June 12, 2022, 5:33am

I have it working for microk8s and openshift4, but have done it without keycloak as a statefulset, i.e. as a deployment and used a headless service. I am also not using bitnamis image so I have built one of my own. I think the KC_CACHE, KC_CACHE_STACK are needed during buildtime but bitnamis are probably doing just that. But now to my suggestion. I had to put “publishNotReadyAddresses” in the headless service to get it working.

apiVersion: v1
kind: Service
metadata:
  name: keycloak-jgroups-ping
  namespace: keycloak
  labels:
    app: keycloak
spec:
  selector:
    app: keycloak
  clusterIP: None
  publishNotReadyAddresses: true

bpedersen2 · June 13, 2022, 9:35am

You need another service for port 4444 to get jgroups to work, your service is the one for the web/api-frontend.

flamingo_as_service · June 13, 2022, 1:47pm

It’s not needed, I haven’t seen anything about port 4444 in the docs. Plus, helm chart published by codecentric works with multiple replicas and it doesn’t use port 4444 anywhere, just a headless service.

bpedersen2 · June 13, 2022, 2:09pm

Well, the codecentric chart uses KUBEPING, not JGROUPSPING and from the readme it looks like it still relies on the wildfly-based version. If using KC17+ which are Quarkus-based by default and there KUBE_PING needs more config, see Does KUBE_PING work with the Keycloak 17 (Quarkus distro)? for that.

flamingo_as_service · June 13, 2022, 2:17pm

You’re looking at the wrong chart. If you go to the “keycloakx” directory, you can see that it uses the new Quarkus-based keycloak distribution. Plus, no it is not using KUBE_PING: by default. It is using jgroups and DNS_PING.

dnulnets · June 14, 2022, 1:04pm

Hi, well I am using both KC 17 and 18 in the same way as you do, with the difference that I use a Deployment instead of a Statefulset and using publishedNotReadyAddresses. It works perfectly in both microk8s and openshift. If I remove the publishNotReadyAddress from the service it behaves exactly as you have described. It ends up as two different clusters. My KC image looks like this:

FROM quay.io/keycloak/keycloak:18.0.0 as builder
ENV KC_HEALTH_ENABLED=true
ENV KC_METRICS_ENABLED=true
ENV KC_DB=postgres
ENV KC_CACHE=ispn
ENV KC_CACHE_STACK=kubernetes
ENV KC_HTTP_RELATIVE_PATH=/auth
RUN /opt/keycloak/bin/kc.sh build
FROM quay.io/keycloak/keycloak:18.0.0
COPY --from=builder /opt/keycloak/lib/quarkus/ /opt/keycloak/lib/quarkus/
WORKDIR /opt/keycloak
ENTRYPOINT ["/opt/keycloak/bin/kc.sh", "start"]

I don’t see why it should not work for you as well using a stetfulset and set the flag in the service as well.

weltonrodrigo · June 14, 2022, 4:19pm

I believe it’s not necessary to have a service on port 4444 for jgroups to work.

The headless service will provide the pod IPs and them the pods will connect directly to one another, via IP.

If you have something like Istio filtering the traffic between pods, you’ll need a rule for jgroups port.

weltonrodrigo · June 14, 2022, 4:26pm

I’m not very familiar with the Bitnami chart, but I think you should check if Istio is filtering traffic between pods here.

If not, I suggest you start keycloak in debug log level with KEYCLOAK_LOGLEVEL=debug and filter for jgroup messages.

The settings you have seems to be the ones necessary, looking at the chart template here charts/configmap-env-vars.yaml at master · bitnami/charts · GitHub

dnulnets · June 14, 2022, 4:48pm

Agree, it is not needed, it is just for finding the IP:s But, isn’t jgroups using UDP for keycloak? I might be wrong but that is not proxied by istio. I do not have any specific settings for my istio sidecar, more than “holdApplicationUntilProxyStarts” because keycloak is so quick in starting

dnulnets · June 14, 2022, 4:50pm

Same for me, I do not know the bitnami chart. But if you also look at the headless-service in that chart they have set the publishNotReadyAddress flag as I mentioned before.

github.com

bitnami/charts/blob/master/bitnami/keycloak/templates/headless-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: {{ printf "%s-headless" (include "keycloak.fullname" .) | trunc 63 | trimSuffix "-" }}
  namespace: {{ .Release.Namespace }}
  labels: {{- include "common.labels.standard" . | nindent 4 }}
    app.kubernetes.io/component: keycloak
    {{- if .Values.commonLabels }}
    {{- include "common.tplvalues.render" ( dict "value" .Values.commonLabels "context" $ ) | nindent 4 }}
    {{- end }}
  {{- if .Values.commonAnnotations }}
  annotations: {{- include "common.tplvalues.render" ( dict "value" .Values.commonAnnotations "context" $ ) | nindent 4 }}
  {{- end }}
spec:
  type: ClusterIP
  clusterIP: None
  ports:
    - name: http
      port: {{ coalesce .Values.service.ports.http .Values.service.port }}
      protocol: TCP

This file has been truncated. show original

flamingo_as_service · June 22, 2022, 7:58pm

Hey I’ve tried using the publishNotReadyAddress flag as well but I didn’t get it to work. In fact, I’ve tried deploying this exact chart without any modifications and the cluster was not forming properly at all. Each pod forms it’s own infinispan cluster. In the end I decided to go with the oldie but goodie jdbc ping.

dnulnets · June 26, 2022, 7:00am

Sorry to hear, then that was not the issue. The only difference then is you are using bitnami and I am using the keycloak distribution and building my own image. It is probably something “trivial” that eludes us

agtogna · January 27, 2023, 11:35pm

Hello, there’s a problem with the NetworkPolicy created by the Bitnami chart which blocks the Infinispan cache traffic, thus not enabling the clustered cache.
Check [bitnami/keycloak] Fix NetworkPolicy to open the Infinispan port (7800) by agtogna · Pull Request #14576 · bitnami/charts · GitHub for a fix in the chart itself (if it gets merged).

Topic		Replies	Views
Questions related to Keycloak HA Getting advice authentication , oidc	6	1490	May 4, 2023
Clustering keycloak on 3 machines Configuring the server clustering	4	1723	May 2, 2023
Standalone-HA Clustering Configuration Fails! Configuring the server clustering	5	11327	October 26, 2022
Keycloak Cluster / High Availability (Quarkus version) set up	8	4261	February 7, 2024
Is external infinispan required for Keycloak HA? Getting advice authentication , oidc	3	1704	May 5, 2023

HA setup in kubernetes

Related topics