Adventures with Docker Swarm and Keycloak Clustering

I have spent some considerable time attempting to run the Keycloak container clustered in swarm mode.

TL;DR I ended up adding dn.DNS_PING and changing the docker-entrypoint.sh to set the bind address to match the dnsrr ip from the service, its down the bottom.

Adventures

First run was to try it in a local docker without swam and scale to 2 with docker-compose. This worked well once I adjusted/added the environment variables CACHE_OWNERS_COUNT and CACHE_OWNERS_AUTH_SESSIONS_COUNT

Having read some background and also confirming mcast does not work in swarm mode/overlay networks I settled on dns.DNS_PING for discovery and converted the entrypoint_mode to dnsrr.

While I could see the dnsquery via ROOT_LOGLEVEL=DEBUG or tcpdump and see the correct responses the result was two independent cluster for jgroups/infinispan.

Resultant command from entrypoint.

/bin/sh /opt/jboss/keycloak/bin/standalone.sh -Djboss.bind.address=10.0.1.41 -Djboss.bind.address.private=10.0.1.41 -Djboss.bind.address=172.19.0.4 -Djboss.bind.address.private=172.19.0.4 -c=standalone-ha.xml -b 0.0.0.0

Jgroups/Infinispan logs

8:13:07,149 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool – 39) WFLYCLINF0001: Activating Infinispan subsystem.
18:13:07,191 INFO [org.jboss.as.clustering.jgroups] (ServerService Thread Pool – 43) WFLYCLJG0001: Activating JGroups subsystem. JGroups version 4.2.4
18:13:12,086 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 60) 90ac043bdf26: no members discovered after 3020 ms: creating cluster as coordinator
18:13:12,859 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,871 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-7) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,872 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-8) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,859 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-5) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,873 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,894 INFO [org.infinispan.CONTAINER] (MSC service thread 1-8) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000078: Starting JGroups channel ejb
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
18:13:13,117 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb
18:13:13,117 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000078: Starting JGroups channel ejb
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,133 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,140 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,149 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,150 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,151 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,151 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]

I noticed in the logs that the ‘wrong’ ip was being bound. The wrong IP here belonging to the docker_gwbridge network (default route) not the network defined in compose. So I decided to try and use the BIND variable and set this to 0.0.0.0

Identical results in that two independent clusters are created.

Resultant command from entrypoint.

/bin/sh /opt/jboss/keycloak/bin/standalone.sh -Djboss.bind.address=0.0.0.0 -Djboss.bind.address.private=0.0.0.0 -c=standalone-ha.xml -b 0.0.0.0

sudo nsenter -t $(docker inspect $(docker ps --filter name=kc_keycloak.1 -q)  | jq '.[].State.Pid') -n ss -tnl 'sport = :7600' 
State              Recv-Q             Send-Q                           Local Address:Port                           Peer Address:Port             Process             
LISTEN             0                  50                                     0.0.0.0:7600                                0.0.0.0:*                                    
jgroups/inifinispan logs

18:19:36,229 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 60) 9261f2605131: no members discovered after 3032 ms: creating cluster as coordinator
18:19:36,780 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,780 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-3) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,791 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,801 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-8) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,802 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-1) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,823 INFO [org.infinispan.CONTAINER] (MSC service thread 1-6) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,041 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,043 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,049 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,053 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,056 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]

I also tried the JDBC_PING and TCPPING with no success.


How I got this working.

For some reason I thought this might be related to the bind ip and/or routes interacting with jgroups.

First of all I updated the docker-entrypoint.sh:

docker-entrypoint.sh diff
--- docker-entrypoint.sh.orig	2020-09-15 05:01:53.000000000 -0400
+++ docker-entrypoint.sh.test	2021-01-07 13:41:01.645836780 -0500
@@ -77,7 +77,7 @@
 ########################
 
 if [[ -z ${BIND:-} ]]; then
-    BIND=$(hostname --all-ip-addresses)
+    BIND=$(hostname --ip-address)
 fi
 if [[ -z ${BIND_OPTS:-} ]]; then
     for BIND_IP in $BIND

This combined with the entrypoint_mode: dnsrr, JGROUPS_DISCOVERY_PROTOCOL: dns.DNS_PING and JGROUPS_DISCOVERY_PROPERTIES: dns_query=keycloak I managed to get a cluster.

I don’t know why this is and will happily receive some education.

jgoups/infinispan logs

Setting JGroups discovery to dns.DNS_PING with properties {dns_query=>keycloak}
19:14:15,505 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool – 39) WFLYCLINF0001: Activating Infinispan subsystem.
19:14:15,568 INFO [org.jboss.as.clustering.jgroups] (ServerService Thread Pool – 43) WFLYCLJG0001: Activating JGroups subsystem. JGroups version 4.2.4
19:14:25,017 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-3) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,027 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,022 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,030 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-2) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,041 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-5) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,070 INFO [org.infinispan.CONTAINER] (MSC service thread 1-2) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000078: Starting JGroups channel ejb
19:14:25,372 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,372 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,373 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,374 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,381 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,395 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,401 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,423 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,428 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,436 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]

If there are multiple networks on a container then this ip method is not deterministic and my testing shows that a cluster is not successfully created/joined.

With an updated docker-entrypoint.sh it again works.

docker-entrypoint.sh
--- docker-entrypoint.sh.orig	2020-09-15 05:01:53.000000000 -0400
+++ docker-entrypoint.sh	2021-01-07 14:29:41.843423045 -0500
@@ -80,6 +80,12 @@
     BIND=$(hostname --all-ip-addresses)
 fi
 if [[ -z ${BIND_OPTS:-} ]]; then
+    if [[ -n ${DOCKER_SWARM:-} ]]; then
+      SVCIP=$(getent hosts ${SVC_NAME:-keycloak} | awk '{print $1}'| uniq)
+      THIS_IP=$(echo ${SVCIP} ${BIND} | sed 's/ /\n/g' | sort | uniq -d)
+      echo INFO: Using bindip ${THIS_IP} for jgroups
+      BIND=${THIS_IP}
+    fi
     for BIND_IP in $BIND
     do
         BIND_OPTS+=" -Djboss.bind.address=$BIND_IP -Djboss.bind.address.private=$BIND_IP "

docker-compose.yaml
version: '3.8'
networks:
  keycloak:
  foo:
secrets:
  pg-password:
    file: ./pg-password
services:
  postgres:
    image: postgres:12
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/pg-password
      POSTGRES_DB: keycloak
      POSTGRES_USER: keycloak
    secrets:
     - pg-password
    networks:
      - keycloak

  keycloak:
    image: keycloak:test
    deploy:
      endpoint_mode: dnsrr
      replicas: 2
      placement:
        constraints:
          - "node.platform.os==linux"
#        max_replicas_per_node: 1
      labels:
        traefik.enable: "true"
        traefik.http.routers.keycloak.rule: Host(`keycloak`)
        traefik.http.routers.keycloak.entrypoints: http
        traefik.http.services.keycloak.loadbalancer.server.port: 8080
        traefik.http.services.keycloak.loadbalancer.healthcheck.path: /auth/
    secrets:
     - pg-password
    environment:
#      ROOT_LOGLEVEL: DEBUG
      CACHE_OWNERS_COUNT: 2
      CACHE_OWNERS_AUTH_SESSIONS_COUNT: 2
      DB_ADDR: postgres
      DB_DATABASE: keycloak
      DB_PASSWORD_FILE: /run/secrets/pg-password
      DB_SCHEMA: public
      DB_USER: "keycloak"
      DB_VENDOR: postgres
      DOCKER_SWARM: "true"
      JGROUPS_DISCOVERY_PROTOCOL: dns.DNS_PING
      JGROUPS_DISCOVERY_PROPERTIES: dns_query=keycloak
      KEYCLOAK_PASSWORD: password
      KEYCLOAK_USER: admin
      PROXY_ADDRESS_FORWARDING: "true"
    networks:
      - keycloak
      - foo
  traefik:
    image: traefik:v2.3
    ports:
     - '80:80'
    command:
    - --entrypoints.http.address=:80
    - --providers.docker=true
    - --providers.docker.swarmMode=true
    - --providers.docker.exposedbydefault=false
    - --accesslog
    networks:
    - keycloak
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro

1 Like

Thank you very much!!
It works for me too!

1 Like

Hi @kiwicloak congrats!

I’ve also played with Keycloak clustering deployed in Docker Swarm as you can see in this docker-swarm-environment project.

Btw, I’ve built the keycloak-clustered Docker image that extends Keycloak Official Docker Image and adds some scripts in order to make it easier to run a cluster of Keycloak instances.

Hey @kiwicloak @cinco !

Is this solution still working for you guys?

I’m experiencing similiar issues with JDBC_PING and MSSQL. It works for a while, but it does not work with some traffic (~1k rpm)

Hi @brunocascio

The solution in my post( dns.DNS_PING) is running fine still. My traffic is very light.

1 Like

Thanks, I’ll take a look soon.

I’m experiencing similiar issues with JDBC_PING and MSSQL. It works for a while, but it does not work with some traffic (~1k rpm)

Just for future references, it was related to a customer keycloak extension developed, not to keycloak/discovery-protocol itself

Has anyone succeeded in Keycloak version >=17 cluster in Docker Swarm?
As per documentation, dns.DNS_PING in enabled by --cache-stack=kubernetes and -Djgroups.dns.query, which I set to my swarm service name JAVA_OPTS_APPEND=-Djgroups.dns.query=stage_keycloak_cluster.
This did not work, two independent clusters as a result.
In debug logs i see:

DEBUG [org.jgroups.protocols.dns.DNS_PING] (jgroups-16,stage_keycloak_cluster-60990) stage_keycloak_cluster-60990: sending discovery requests to hosts [10.0.0.211:0] on ports [7800 .. 7800]

and

DEBUG [org.jgroups.protocols.dns.DNS_PING] (jgroups-20,stage_keycloak_cluster-57943) stage_keycloak_cluster-57943: sending discovery requests to hosts [10.0.0.209:0] on ports [7800 .. 7800]

I’ve also tried do tweak kc.sh the same way as it was done here with docker-entrypoint.sh by adding BIND_OPTS=" -Djboss.bind.address=${THIS_IP} -Djboss.bind.address.private=${THIS_IP} " - but also without much luck.

@dvt114 yes, I finally managed to get this working using a different set of properties:

BIND_OPTS=" -Djgroups.bind.address=${THIS_IP}

On a different note, I also added health check to the container - and that resulted in problems with the otherwise excelent startup script mentioned above, because the ip is not part of the resolved ip list from docker swarm until it answers correctly on a health check.

To remedy this I changed the startup script to simply be:

BIND=$(hostname --ip-address)
export JAVA_OPTS="${JAVA_OPTS} -Djgroups.bind.address=${BIND}"

Hi, does it only need two file (docker-entrypoint.sh) and (docker-compose.yaml) for the cluster to work? im a bit confused. Been a month studying on starting keycloak in a cluster

Yes. Keep in mind this was Jan 2021 Keycloak 11(I think).

Make sense… I’ve been studying and trying to configure a HA cluster keycloak on docker… but just failure… been trying many repos and even the official steps from keycloak. I have just created a topic just now on Kecycloak cluster ha setup with docker and external mariadb server

Hey there, I’m getting

hostname: command not found

I’m using the latest version of keycloak

This was over two years ago on v11. v21 is current.

With the changes between these versions I would not expect this workaround to be valid.

Hello,

I habe to dig up this thread, as i am currently switching from Keycloak 17 to 23. In my understanding our current loadbalancing solution using modcluster will not work with the dockerized Quarkus version of Keycloak, is this correct?
Is there a best practice for loadbalancing the dockerized Quarkus Keycloak?

Thank you for any advice.
Thomas