Adventures with Docker Swarm and Keycloak Clustering

I have spent some considerable time attempting to run the Keycloak container clustered in swarm mode.

TL;DR I ended up adding dn.DNS_PING and changing the docker-entrypoint.sh to set the bind address to match the dnsrr ip from the service, its down the bottom.

Adventures

First run was to try it in a local docker without swam and scale to 2 with docker-compose. This worked well once I adjusted/added the environment variables CACHE_OWNERS_COUNT and CACHE_OWNERS_AUTH_SESSIONS_COUNT

Having read some background and also confirming mcast does not work in swarm mode/overlay networks I settled on dns.DNS_PING for discovery and converted the entrypoint_mode to dnsrr.

While I could see the dnsquery via ROOT_LOGLEVEL=DEBUG or tcpdump and see the correct responses the result was two independent cluster for jgroups/infinispan.

Resultant command from entrypoint.

/bin/sh /opt/jboss/keycloak/bin/standalone.sh -Djboss.bind.address=10.0.1.41 -Djboss.bind.address.private=10.0.1.41 -Djboss.bind.address=172.19.0.4 -Djboss.bind.address.private=172.19.0.4 -c=standalone-ha.xml -b 0.0.0.0

Jgroups/Infinispan logs

8:13:07,149 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool – 39) WFLYCLINF0001: Activating Infinispan subsystem.
18:13:07,191 INFO [org.jboss.as.clustering.jgroups] (ServerService Thread Pool – 43) WFLYCLJG0001: Activating JGroups subsystem. JGroups version 4.2.4
18:13:12,086 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 60) 90ac043bdf26: no members discovered after 3020 ms: creating cluster as coordinator
18:13:12,859 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,871 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-7) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,872 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-8) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,859 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-5) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,873 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:13:12,894 INFO [org.infinispan.CONTAINER] (MSC service thread 1-8) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000078: Starting JGroups channel ejb
18:13:13,115 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
18:13:13,117 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb
18:13:13,117 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000078: Starting JGroups channel ejb
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,124 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,133 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000094: Received new cluster view for channel ejb: [90ac043bdf26|0] (1) [90ac043bdf26]
18:13:13,140 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,149 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,150 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,151 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]
18:13:13,151 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is 90ac043bdf26, physical addresses are [172.19.0.4:7600]

I noticed in the logs that the ‘wrong’ ip was being bound. The wrong IP here belonging to the docker_gwbridge network (default route) not the network defined in compose. So I decided to try and use the BIND variable and set this to 0.0.0.0

Identical results in that two independent clusters are created.

Resultant command from entrypoint.

/bin/sh /opt/jboss/keycloak/bin/standalone.sh -Djboss.bind.address=0.0.0.0 -Djboss.bind.address.private=0.0.0.0 -c=standalone-ha.xml -b 0.0.0.0

sudo nsenter -t $(docker inspect $(docker ps --filter name=kc_keycloak.1 -q)  | jq '.[].State.Pid') -n ss -tnl 'sport = :7600' 
State              Recv-Q             Send-Q                           Local Address:Port                           Peer Address:Port             Process             
LISTEN             0                  50                                     0.0.0.0:7600                                0.0.0.0:*                                    
jgroups/inifinispan logs

18:19:36,229 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool – 60) 9261f2605131: no members discovered after 3032 ms: creating cluster as coordinator
18:19:36,780 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,780 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-3) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,791 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,801 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-8) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,802 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-1) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
18:19:36,823 INFO [org.infinispan.CONTAINER] (MSC service thread 1-6) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000078: Starting JGroups channel ejb
18:19:37,026 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,033 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,035 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000094: Received new cluster view for channel ejb: [9261f2605131|0] (1) [9261f2605131]
18:19:37,041 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,043 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,049 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,053 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]
18:19:37,056 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000079: Channel ejb local address is 9261f2605131, physical addresses are [0.0.0.0:7600]

I also tried the JDBC_PING and TCPPING with no success.


How I got this working.

For some reason I thought this might be related to the bind ip and/or routes interacting with jgroups.

First of all I updated the docker-entrypoint.sh:

docker-entrypoint.sh diff
--- docker-entrypoint.sh.orig	2020-09-15 05:01:53.000000000 -0400
+++ docker-entrypoint.sh.test	2021-01-07 13:41:01.645836780 -0500
@@ -77,7 +77,7 @@
 ########################
 
 if [[ -z ${BIND:-} ]]; then
-    BIND=$(hostname --all-ip-addresses)
+    BIND=$(hostname --ip-address)
 fi
 if [[ -z ${BIND_OPTS:-} ]]; then
     for BIND_IP in $BIND

This combined with the entrypoint_mode: dnsrr, JGROUPS_DISCOVERY_PROTOCOL: dns.DNS_PING and JGROUPS_DISCOVERY_PROPERTIES: dns_query=keycloak I managed to get a cluster.

I don’t know why this is and will happily receive some education.

jgoups/infinispan logs

Setting JGroups discovery to dns.DNS_PING with properties {dns_query=>keycloak}
19:14:15,505 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool – 39) WFLYCLINF0001: Activating Infinispan subsystem.
19:14:15,568 INFO [org.jboss.as.clustering.jgroups] (ServerService Thread Pool – 43) WFLYCLJG0001: Activating JGroups subsystem. JGroups version 4.2.4
19:14:25,017 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-3) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,027 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-4) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,022 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-6) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,030 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-2) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,041 INFO [org.infinispan.PERSISTENCE] (MSC service thread 1-5) ISPN000556: Starting user marshaller ‘org.wildfly.clustering.infinispan.marshalling.jboss.JBossMarshaller’
19:14:25,070 INFO [org.infinispan.CONTAINER] (MSC service thread 1-2) ISPN000128: Infinispan version: Infinispan ‘Turia’ 10.1.8.Final
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb
19:14:25,360 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000078: Starting JGroups channel ejb
19:14:25,372 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,372 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,373 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,374 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,381 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [eae0f6d94438|1] (2) [eae0f6d94438, af39ff385235]
19:14:25,395 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,401 INFO [org.infinispan.CLUSTER] (MSC service thread 1-6) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,423 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,428 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]
19:14:25,436 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000079: Channel ejb local address is af39ff385235, physical addresses are [10.0.2.66:7600]

If there are multiple networks on a container then this ip method is not deterministic and my testing shows that a cluster is not successfully created/joined.

With an updated docker-entrypoint.sh it again works.

docker-entrypoint.sh
--- docker-entrypoint.sh.orig	2020-09-15 05:01:53.000000000 -0400
+++ docker-entrypoint.sh	2021-01-07 14:29:41.843423045 -0500
@@ -80,6 +80,12 @@
     BIND=$(hostname --all-ip-addresses)
 fi
 if [[ -z ${BIND_OPTS:-} ]]; then
+    if [[ -n ${DOCKER_SWARM:-} ]]; then
+      SVCIP=$(getent hosts ${SVC_NAME:-keycloak} | awk '{print $1}'| uniq)
+      THIS_IP=$(echo ${SVCIP} ${BIND} | sed 's/ /\n/g' | sort | uniq -d)
+      echo INFO: Using bindip ${THIS_IP} for jgroups
+      BIND=${THIS_IP}
+    fi
     for BIND_IP in $BIND
     do
         BIND_OPTS+=" -Djboss.bind.address=$BIND_IP -Djboss.bind.address.private=$BIND_IP "

docker-compose.yaml
version: '3.8'
networks:
  keycloak:
  foo:
secrets:
  pg-password:
    file: ./pg-password
services:
  postgres:
    image: postgres:12
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/pg-password
      POSTGRES_DB: keycloak
      POSTGRES_USER: keycloak
    secrets:
     - pg-password
    networks:
      - keycloak

  keycloak:
    image: keycloak:test
    deploy:
      endpoint_mode: dnsrr
      replicas: 2
      placement:
        constraints:
          - "node.platform.os==linux"
#        max_replicas_per_node: 1
      labels:
        traefik.enable: "true"
        traefik.http.routers.keycloak.rule: Host(`keycloak`)
        traefik.http.routers.keycloak.entrypoints: http
        traefik.http.services.keycloak.loadbalancer.server.port: 8080
        traefik.http.services.keycloak.loadbalancer.healthcheck.path: /auth/
    secrets:
     - pg-password
    environment:
#      ROOT_LOGLEVEL: DEBUG
      CACHE_OWNERS_COUNT: 2
      CACHE_OWNERS_AUTH_SESSIONS_COUNT: 2
      DB_ADDR: postgres
      DB_DATABASE: keycloak
      DB_PASSWORD_FILE: /run/secrets/pg-password
      DB_SCHEMA: public
      DB_USER: "keycloak"
      DB_VENDOR: postgres
      DOCKER_SWARM: "true"
      JGROUPS_DISCOVERY_PROTOCOL: dns.DNS_PING
      JGROUPS_DISCOVERY_PROPERTIES: dns_query=keycloak
      KEYCLOAK_PASSWORD: password
      KEYCLOAK_USER: admin
      PROXY_ADDRESS_FORWARDING: "true"
    networks:
      - keycloak
      - foo
  traefik:
    image: traefik:v2.3
    ports:
     - '80:80'
    command:
    - --entrypoints.http.address=:80
    - --providers.docker=true
    - --providers.docker.swarmMode=true
    - --providers.docker.exposedbydefault=false
    - --accesslog
    networks:
    - keycloak
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro

1 Like

Thank you very much!!
It works for me too!

1 Like

Hi @kiwicloak congrats!

I’ve also played with Keycloak clustering deployed in Docker Swarm as you can see in this docker-swarm-environment project.

Btw, I’ve built the keycloak-clustered Docker image that extends Keycloak Official Docker Image and adds some scripts in order to make it easier to run a cluster of Keycloak instances.

Hey @kiwicloak @cinco !

Is this solution still working for you guys?

I’m experiencing similiar issues with JDBC_PING and MSSQL. It works for a while, but it does not work with some traffic (~1k rpm)

Hi @brunocascio

The solution in my post( dns.DNS_PING) is running fine still. My traffic is very light.

1 Like

Thanks, I’ll take a look soon.

I’m experiencing similiar issues with JDBC_PING and MSSQL. It works for a while, but it does not work with some traffic (~1k rpm)

Just for future references, it was related to a customer keycloak extension developed, not to keycloak/discovery-protocol itself