HA setup in kubernetes

I’m trying to get keycloak to work in a HA mode with multiple replicas however I’m struggling to do so. I would like to use jgroups.dns.query for this. Mind you, I’m using a docker image published by bitnami and istio as my gateway.

Here is my configuration (I omitted env vars that are not related to clustering, as keycloak works with a single replica, but problems arise when I use more than one replica).

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: keycloak
  labels:
    app: keycloak
spec:
  serviceName: "headless-service"
  replicas: 3
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - name: keycloak
        image: bitnami/keycloak:17
        env:
        - name: KEYCLOAK_CACHE_TYPE
          value: ispn
        - name: KEYCLOAK_CACHE_STACK
          value: kubernetes
        - name: KEYCLOAK_EXTRA_ARGS
          value: "-Djgroups.dns.query=headless-service"
        ports:
          - name: http
            containerPort: 8080
          - name: https
            containerPort: 8443
        readinessProbe:
          httpGet:
            path: /
            port: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: headless-service
  labels:
    app: keycloak
spec:
  type: ClusterIP
  clusterIP: None
  ports:
  - port: 8080
    name: http
    targetPort: 8080
  selector:
    app: keycloak

So I’ve tried recreating various examples that I found online and I also looked at bitnami’s keycloak helm chart to get it to work however it seems that the pods are not discovering each other and every pod seems to create it’s own cluster. Here is a snippet from the logs:

2022-06-07 12:37:10,263 INFO  [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) keycloak-1-367: no members discovered after 2003 ms: creating cluster as coordinator

And this repeats for keycloak-0 and keycloak-2. I can’t access keycloak web UI and I’m getting the error Your login attempt timed out. Login will start from the beginning.

What am I doing incorrectly here?

I have it working for microk8s and openshift4, but have done it without keycloak as a statefulset, i.e. as a deployment and used a headless service. I am also not using bitnamis image so I have built one of my own. I think the KC_CACHE, KC_CACHE_STACK are needed during buildtime but bitnamis are probably doing just that. But now to my suggestion. I had to put “publishNotReadyAddresses” in the headless service to get it working.

apiVersion: v1
kind: Service
metadata:
  name: keycloak-jgroups-ping
  namespace: keycloak
  labels:
    app: keycloak
spec:
  selector:
    app: keycloak
  clusterIP: None
  publishNotReadyAddresses: true

You need another service for port 4444 to get jgroups to work, your service is the one for the web/api-frontend.

It’s not needed, I haven’t seen anything about port 4444 in the docs. Plus, helm chart published by codecentric works with multiple replicas and it doesn’t use port 4444 anywhere, just a headless service.

1 Like

Well, the codecentric chart uses KUBEPING, not JGROUPSPING and from the readme it looks like it still relies on the wildfly-based version. If using KC17+ which are Quarkus-based by default and there KUBE_PING needs more config, see Does KUBE_PING work with the Keycloak 17 (Quarkus distro)? for that.

You’re looking at the wrong chart. If you go to the “keycloakx” directory, you can see that it uses the new Quarkus-based keycloak distribution. Plus, no it is not using KUBE_PING: by default. It is using jgroups and DNS_PING.

Hi, well I am using both KC 17 and 18 in the same way as you do, with the difference that I use a Deployment instead of a Statefulset and using publishedNotReadyAddresses. It works perfectly in both microk8s and openshift. If I remove the publishNotReadyAddress from the service it behaves exactly as you have described. It ends up as two different clusters. My KC image looks like this:

FROM quay.io/keycloak/keycloak:18.0.0 as builder
ENV KC_HEALTH_ENABLED=true
ENV KC_METRICS_ENABLED=true
ENV KC_DB=postgres
ENV KC_CACHE=ispn
ENV KC_CACHE_STACK=kubernetes
ENV KC_HTTP_RELATIVE_PATH=/auth
RUN /opt/keycloak/bin/kc.sh build
FROM quay.io/keycloak/keycloak:18.0.0
COPY --from=builder /opt/keycloak/lib/quarkus/ /opt/keycloak/lib/quarkus/
WORKDIR /opt/keycloak
ENTRYPOINT ["/opt/keycloak/bin/kc.sh", "start"]

I don’t see why it should not work for you as well using a stetfulset and set the flag in the service as well.

I believe it’s not necessary to have a service on port 4444 for jgroups to work.

The headless service will provide the pod IPs and them the pods will connect directly to one another, via IP.

If you have something like Istio filtering the traffic between pods, you’ll need a rule for jgroups port.

I’m not very familiar with the Bitnami chart, but I think you should check if Istio is filtering traffic between pods here.

If not, I suggest you start keycloak in debug log level with KEYCLOAK_LOGLEVEL=debug and filter for jgroup messages.

The settings you have seems to be the ones necessary, looking at the chart template here charts/configmap-env-vars.yaml at master · bitnami/charts · GitHub

Agree, it is not needed, it is just for finding the IP:s :slight_smile: But, isn’t jgroups using UDP for keycloak? I might be wrong but that is not proxied by istio. I do not have any specific settings for my istio sidecar, more than “holdApplicationUntilProxyStarts” because keycloak is so quick in starting :slight_smile:

Same for me, I do not know the bitnami chart. But if you also look at the headless-service in that chart they have set the publishNotReadyAddress flag as I mentioned before.

Hey I’ve tried using the publishNotReadyAddress flag as well but I didn’t get it to work. In fact, I’ve tried deploying this exact chart without any modifications and the cluster was not forming properly at all. Each pod forms it’s own infinispan cluster. In the end I decided to go with the oldie but goodie jdbc ping.

Sorry to hear, then that was not the issue. The only difference then is you are using bitnami and I am using the keycloak distribution and building my own image. It is probably something “trivial” that eludes us :slight_smile:

Hello, there’s a problem with the NetworkPolicy created by the Bitnami chart which blocks the Infinispan cache traffic, thus not enabling the clustered cache.
Check [bitnami/keycloak] Fix NetworkPolicy to open the Infinispan port (7800) by agtogna · Pull Request #14576 · bitnami/charts · GitHub for a fix in the chart itself (if it gets merged).