How keycloak works in a cluster with cache

Hello everyone! I hope you are all well!
I’m trying to use Keycloak version 26.1.0 in cluster mode.

My scenario:
I have an internal load balancer that points to an Azure Scale-Set. In this scale-set, I have two VMs. In each one, I have a docker-compose that configures Keycloak for me.

When I run my containers, Keycloak starts cluster mode successfully, but it uses the internal IP of the Docker network. This causes me to have several timeouts when it tries to communicate with the other instance, since the instances are in different VMs, with different internal networks.

I effectively have communication between the VMs, through the VM IP and with the appropriate ports (7800).

My Docker - VM01
keycloak01:
image: keycloak:26.1.0
container_name: keycloak01
command:
- start
- --truststore-paths=/opt/keycloak/certs/DigiCertGlobalRootCA.crt,/opt/keycloak/certs/DigiCertGlobalRootG2.crt.pem,/opt/keycloak/certs/MicrosoftRsaRootCertificateAuthority2017.cert
environment:
KC_BOOTSTRAP_ADMIN_USERNAME: sreadmin
KC_BOOTSTRAP_ADMIN_PASSWORD: “admin”
KC_DB: postgres
KC_DB_URL: “jdbc:postgresql://domain.postgres.database.azure.com:5432/keycloak?sslmode=verify-full&sslrootcert=/opt/keycloak/certs/DigiCertGlobalRootCA.crt”
KC_DB_SCHEMA: public
KC_LOG_LEVEL: INFO
DB_DATABASE: keycloak
KC_DB_USERNAME: manager
KC_DB_PASSWORD: “pass”
KC_PROXY: edge
KC_HOSTNAME: keycloak01
KC_HTTP_ENABLED: “true”
KC_HOSTNAME_STRICT: true
KC_CACHE: ispn
KC_CACHE_STACK: jdbc-ping #
KC_JGROUPS_DISCOVERY_PROTOCOL: JDBC_PING
JGROUPS_DISCOVERY_EXTERNAL_IP: 10.6.6.5
JGROUPS_BIND_ADDR: 10.6.6.5
ports:
- “80:8080” # Porta HTTP
- “7800:7800”
volumes:
- ./certs:/opt/keycloak/certs:ro
restart: always
networks:
- app-network

My Docker - VM02
keycloak02:
image: keycloak:26.1.0
container_name: keycloak02
command:
- start
- --truststore-paths=/opt/keycloak/certs/DigiCertGlobalRootCA.crt,/opt/keycloak/certs/DigiCertGlobalRootG2.crt.pem,/opt/keycloak/certs/MicrosoftRsaRootCertificateAuthority2017.cert
environment:
KC_BOOTSTRAP_ADMIN_USERNAME: sreadmin
KC_BOOTSTRAP_ADMIN_PASSWORD: “admin”
KC_DB: postgres
KC_DB_URL: “jdbc:postgresql://domain.database.azure.com:5432/keycloak?sslmode=verify-full&sslrootcert=/opt/keycloak/certs/DigiCertGlobalRootCA.crt”
KC_DB_SCHEMA: public
KC_LOG_LEVEL: INFO
DB_DATABASE: keycloak
KC_DB_USERNAME: manager
KC_DB_PASSWORD: “pass”
KC_PROXY: edge
KC_HOSTNAME: keycloak02
KC_HTTP_ENABLED: “true”
KC_HOSTNAME_STRICT: true
KC_CACHE: ispn
KC_CACHE_STACK: jdbc-ping #
KC_JGROUPS_DISCOVERY_PROTOCOL: JDBC_PING
JGROUPS_DISCOVERY_EXTERNAL_IP: 10.6.6.6
JGROUPS_BIND_ADDR: 10.6.6.6
ports:
- “80:8080” # Porta HTTP
- “7800:7800”
volumes:
- ./certs:/opt/keycloak/certs:ro
restart: always
networks:
- app-network
Logs:
keycloak01 | 2025-02-07 11:46:33,240 INFO [org.infinispan.CLUSTER] (main) ISPN000078: Starting JGroups channel ISPN with stack jdbc-ping
keycloak01 | 2025-02-07 11:46:33,242 INFO [org.jgroups.JChannel] (main) local_addr: a5d5d19a-1af4-46a3-8d4f-36f6f603b0c0, name: af13e6b70dd6-35000
keycloak01 | 2025-02-07 11:46:33,253 INFO [org.jgroups.protocols.FD_SOCK2] (main) server listening on *.57800
keycloak01 | 2025-02-07 11:46:33,269 INFO [org.jgroups.protocols.pbcast.GMS] (main) af13e6b70dd6-35000: no members discovered after 14 ms: creating cluster as coordinator
keycloak01 | 2025-02-07 11:46:33,293 INFO [org.infinispan.CLUSTER] (main) ISPN000094: Received new cluster view for channel ISPN: [af13e6b70dd6-35000|0] (1) [af13e6b70dd6-35000]
keycloak01 | 2025-02-07 11:46:33,393 INFO [org.infinispan.CLUSTER] (main) ISPN000079: Channel ISPN local address is af13e6b70dd6-35000, physical addresses are [172.28.0.2:7800]
keycloak01 | 2025-02-07 11:46:33,904 INFO [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (main) Node name: af13e6b70dd6-35000, Site name: null
keycloak01 | 2025-02-07 11:46:33,910 INFO [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
keycloak01 | 2025-02-07 11:46:35,637 WARN [io.agroal.pool] (main) Datasource ‘’: JDBC resources leaked: 3 ResultSet(s) and 0 Statement(s)
keycloak01 | 2025-02-07 11:46:35,837 INFO [io.quarkus] (main) Keycloak 26.1.0 on JVM (powered by Quarkus 3.15.2) started in 10.822s. Listening on: http://0.0.0.0:8080
keycloak01 | 2025-02-07 11:46:35,838 INFO [io.quarkus] (main) Profile prod activated.
keycloak01 | 2025-02-07 11:46:35,838 INFO [io.quarkus] (main) Installed features: [agroal, cdi, hibernate-orm, jdbc-postgresql, keycloak, narayana-jta, opentelemetry, reactive-routes, rest, rest-jackson, smallrye-context-propagation, vertx]
keycloak01 | 2025-02-07 11:46:36,227 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,af13e6b70dd6-35000) JGRP000006: 172.28.0.2:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=58536,localport=7800]: java.net.SocketTimeoutException: Read timed out
keycloak01 | 2025-02-07 11:46:51,232 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,af13e6b70dd6-35000) JGRP000006: 172.28.0.2:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=58785,localport=7800]: java.net.SocketTimeoutException: Read timed out

With the variables:
JGROUPS_DISCOVERY_EXTERNAL_IP: 10.6.6.6
JGROUPS_BIND_ADDR: 10.6.6.6

Shouldn’t jdbc use these IPs instead of the internal IPs? Could someone help me? Thanks :smiley:

The JGROUPS_... variables are legacy and outdated, they won’t be used!
Instead, add this env var:

JAVA_OPTS_APPEND: -Djgroups.external_addr=<your-ip>

IMHO the bind_addr isn’t needed (I didn’t need to set them yet explicitly). But if, simply add a -Djgroups.bind_addr=... to the value of JAVA_OPTS_APPEND.

Additionally, for proper error detection/handling, you should also open port 57800 between the nodes.

Hi,
Thank you in advance for your response.

After these changes, the IP was finally recognized correctly.
keycloak01 | 2025-02-07 12:43:59,446 INFO [org.infinispan.CLUSTER] (main) ISPN000078: Starting JGroups channel ISPN with stack jdbc-ping
keycloak01 | 2025-02-07 12:43:59,448 INFO [org.jgroups.JChannel] (main) local_addr: e7cadcf9-d33f-4315-bfe4-52d5c53debeb, name: cbcf7aab088f-20801
keycloak01 | 2025-02-07 12:43:59,461 INFO [org.jgroups.protocols.FD_SOCK2] (main) server listening on *.57800
keycloak01 | 2025-02-07 12:44:01,476 WARN [org.jgroups.protocols.pbcast.GMS] (main) cbcf7aab088f-20801: JOIN(cbcf7aab088f-20801) sent to 7b880fcfef00-7598 timed out (after 2000 ms), on try 0
keycloak01 | 2025-02-07 12:44:03,481 WARN [org.jgroups.protocols.pbcast.GMS] (main) cbcf7aab088f-20801: JOIN(cbcf7aab088f-20801) sent to 7b880fcfef00-7598 timed out (after 2000 ms), on try 1
keycloak01 | 2025-02-07 12:44:07,574 INFO [org.infinispan.CLUSTER] (main) ISPN000094: Received new cluster view for channel ISPN: [7b880fcfef00-7598|1] (2) [7b880fcfef00-7598, cbcf7aab088f-20801]
keycloak01 | 2025-02-07 12:44:07,663 INFO [org.infinispan.CLUSTER] (main) ISPN000079: Channel ISPN local address is cbcf7aab088f-20801, physical addresses are [10.6.6.5:7800]
keycloak01 | 2025-02-07 12:44:07,973 INFO [org.infinispan.LIFECYCLE] () [Context=org.infinispan.CONFIG] ISPN100002: Starting rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], phase READ_OLD_WRITE_ALL, topology id 2
keycloak01 | 2025-02-07 12:44:08,016 INFO [org.infinispan.LIFECYCLE] (non-blocking-thread–p2-t1) [Context=org.infinispan.CONFIG] ISPN100010: Finished rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], topology id 2
keycloak01 | 2025-02-07 12:44:08,082 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,cbcf7aab088f-20801) JGRP000006: 10.6.6.5:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=50894,localport=7800]: java.net.SocketTimeoutException: Read timed out
keycloak01 | 2025-02-07 12:44:08,142 INFO [org.infinispan.LIFECYCLE] () [Context=actionTokens] ISPN100002: Starting rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], phase READ_OLD_WRITE_ALL, topology id 2
keycloak01 | 2025-02-07 12:44:08,163 INFO [org.infinispan.LIFECYCLE] () [Context=actionTokens] ISPN100010: Finished rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], topology id 2
keycloak01 | 2025-02-07 12:44:08,297 INFO [org.infinispan.LIFECYCLE] () [Context=authenticationSessions] ISPN100002: Starting rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], phase READ_OLD_WRITE_ALL, topology id 2

VM-02:
keycloak02 | 2025-02-07 12:42:06,987 INFO [org.infinispan.CLUSTER] (main) ISPN000078: Starting JGroups channel ISPN with stack jdbc-ping
keycloak02 | 2025-02-07 12:42:06,990 INFO [org.jgroups.JChannel] (main) local_addr: d2f108b9-da74-4f13-adbe-1db792f7b5ea, name: 7b880fcfef00-7598
keycloak02 | 2025-02-07 12:42:07,002 INFO [org.jgroups.protocols.FD_SOCK2] (main) server listening on *.57800
keycloak02 | 2025-02-07 12:42:08,075 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,7b880fcfef00-7598) JGRP000006: 10.6.6.6:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=51360,localport=7800]: java.net.SocketTimeoutException: Read timed out
keycloak02 | 2025-02-07 12:42:09,020 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 0
keycloak02 | 2025-02-07 12:42:11,025 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 1
keycloak02 | 2025-02-07 12:42:13,029 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 2
keycloak02 | 2025-02-07 12:42:15,034 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 3
keycloak02 | 2025-02-07 12:42:17,038 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 4
keycloak02 | 2025-02-07 12:42:19,046 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 5
keycloak02 | 2025-02-07 12:42:21,050 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 6
keycloak02 | 2025-02-07 12:42:23,054 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 7
keycloak02 | 2025-02-07 12:42:23,085 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,7b880fcfef00-7598) JGRP000006: 10.6.6.6:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=51624,localport=7800]: java.net.SocketTimeoutException: Read timed out
keycloak02 | 2025-02-07 12:42:25,059 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 8
keycloak02 | 2025-02-07 12:42:27,066 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: JOIN(7b880fcfef00-7598) sent to af13e6b70dd6-35000 timed out (after 2000 ms), on try 9
keycloak02 | 2025-02-07 12:42:27,066 WARN [org.jgroups.protocols.pbcast.GMS] (main) 7b880fcfef00-7598: too many JOIN attempts (10): becoming singleton
keycloak02 | 2025-02-07 12:42:27,215 INFO [org.infinispan.CLUSTER] (main) ISPN000094: Received new cluster view for channel ISPN: [7b880fcfef00-7598|0] (1) [7b880fcfef00-7598]
keycloak02 | 2025-02-07 12:42:27,310 INFO [org.infinispan.CLUSTER] (main) ISPN000079: Channel ISPN local address is 7b880fcfef00-7598, physical addresses are [10.6.6.6:7800]
keycloak02 | 2025-02-07 12:42:27,813 INFO [org.keycloak.connections.infinispan.DefaultInfinispanConnectionProviderFactory] (main) Node name: 7b880fcfef00-7598, Site name: null
keycloak02 | 2025-02-07 12:42:27,818 INFO [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
keycloak02 | 2025-02-07 12:42:29,446 WARN [io.agroal.pool] (main) Datasource ‘’: JDBC resources leaked: 3 ResultSet(s) and 0 Statement(s)
keycloak02 | 2025-02-07 12:42:29,649 INFO [io.quarkus] (main) Keycloak 26.1.0 on JVM (powered by Quarkus 3.15.2) started in 30.928s. Listening on: http://0.0.0.0:8080
keycloak02 | 2025-02-07 12:42:29,649 INFO [io.quarkus] (main) Profile prod activated.

And in the database I already have both records with their respective IPs.
keycloak=> SELECT * FROM public.JGROUPS_PING;
address | name | cluster_name | ip | coord
---------------------------------------------±-------------------±-------------±--------------±------
uuid://ID | 7b880fcfef00-7598 | ISPN | 10.6.6.6:7800 | t
uuid://ID | cbcf7aab088f-20801 | ISPN | 10.6.6.5:7800 | f

But why do I still keep getting these time-outs?

keycloak01 | 2025-02-07 12:49:53,247 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,cbcf7aab088f-20801) JGRP000006: 10.6.6.5:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=56827,localport=7800]: java.net.SocketTimeoutException: Read timed out

keycloak02:
keycloak02 | 2025-02-07 12:44:08,633 INFO [org.infinispan.LIFECYCLE] (non-blocking-thread–p2-t1) [Context=work] ISPN100010: Finished rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], topology id 2
keycloak02 | 2025-02-07 12:44:08,643 INFO [org.infinispan.CLUSTER] () [Context=work] ISPN100009: Advancing to rebalance phase READ_ALL_WRITE_ALL, topology id 3
keycloak02 | 2025-02-07 12:44:08,647 INFO [org.infinispan.CLUSTER] () [Context=work] ISPN100009: Advancing to rebalance phase READ_NEW_WRITE_ALL, topology id 4
keycloak02 | 2025-02-07 12:44:08,650 INFO [org.infinispan.CLUSTER] () [Context=work] ISPN100010: Finished rebalance with members [7b880fcfef00-7598, cbcf7aab088f-20801], topology id 5
keycloak02 | 2025-02-07 12:44:23,186 WARN [org.jgroups.protocols.TCP] (TcpServer.Acceptor[7800]-1,7b880fcfef00-7598) JGRP000006: 10.6.6.6:7800: failed accepting connection from peer Socket[addr=/168.63.129.16,port=53554,localport=7800]: java.net.SocketTimeoutException: Read timed out

tkx

Don’t know, most probably there’s something on your network, which blocks proper communication between the nodes.

I’ve tested it and now, creating realms, users… in the instances separately, and it seems that everything is synchronized. The IP is also correct.

Now I’m going to open port 57800 and check the details related to the time-out. One last question, is there a place where I can check the documentation that talks about these variables and the changes that will occur?

Thanks for your help. :smiley: