We have 2 KeyCloak 11.0.2 nodes running with standalone-ha.xml (default) both pointed to shared MySQL data source and a physical load balancer in front of the nodes switching connections in round-robin fashion. These are RHEL8 servers.
Services on both nodes start fine. Except we have these issues.
If a user is issued a token via Node1 and the load balancer sends the same user to Node2 his authentication doesn’t work. He’s presented with invalid login attempt or something to that effect. This works only if the user ends up on the same node that has issued the token.
Any data is not shared instantly between the two nodes. There’s about a minute delay. This is observed with new user registrations and activations. We had to disable realmCache and userCache as a work around for this one.
Also I see tons of WARN messages informing that connection MySQL datasource is destroyed as there’s a ping time out.
2020-10-31 20:53:07,980 WARN [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] (Timer-2) IJ000621: Destroying connection that could not be validated: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@36cd5dff[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3605f0ce connection handles=0 lastReturned=1604191089096 lastValidated=1604191088039 lastCheckedOut=1604191089088 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@523baa80 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@1c1a2ce9[pool=KeycloakDS] xaResource=LocalXAResourceImpl@5d0df201[connectionListener=36cd5dff
connectionManager=2b63566b warned=false currentXid=null productName=MySQL productVersion=8.0.20 jndiName=java:/jboss/datasources/KeycloakDS] txSync=null]
To fix these I tried to add/modify different config information like below:
Changing/Assigning unique node identity
< core-environment node-identifier=“keycloak-01” > on Node1
< core-environment node-identifier=“keycloak-02”> on Node2
Assigning 2 owners to all distributed-cache elements
< distributed-cache name=“authenticationSessions” owners=“2”/>
< distributed-cache name=“offlineSessions” owners=“2”/>
< distributed-cache name=“clientSessions” owners=“2”/>
< distributed-cache name=“offlineClientSessions” owners=“2”/>
< distributed-cache name=“loginFailures” owners=“2”/>
Setting JGroups over TCP hoping that nodes become aware of each other and share tokens
< protocol type=“TCPPING”>
< property name=“initial_hosts”>10.100.2.138,10.100.2.139
< property name=“port_range”>10
< property name=“timeout”>3000
< property name=“num_initial_members”>2
< protocol type=“MERGE3”/>
< protocol type=“FD_SOCK” socket-binding=“jgroups-tcp-fd”/>
< protocol type=“FD_SOCK”/>
< protocol type=“FD_ALL”/>
< protocol type=“VERIFY_SUSPECT”/>
< protocol type=“pbcast.NAKACK2”/>
< protocol type=“UNICAST3”/>
< protocol type=“pbcast.STABLE”/>
< protocol type=“pbcast.GMS”/>
< protocol type=“MFC”/>
< protocol type=“FRAG2”/>
None of this is doing anything. I’m lost.
We have successful logins and session and token validation only with only one node up but as soon as we bring the 2nd node up load and balancer throws connections in round-robin fashion and everything breaks.
Any suggestions or help greatly appreciated.