Infinispan and Session Cache

If my Infinispan cache is working properly and I have session cache and client session cache set to 2 owners. What should happen when one instance in a two node keycloak cluster reboots?

My understanding is the load balancer will redirect connections to the remaining instance. It being the other owner in the session caches the login should remain active. And when the other keycloak instance starts back up it will sync its cache with the other surviving owner of the session caches, correct?

I am seeing successful cluster initiation and have all session and client session caches set to 2 owners. However, when I reboot one server any sessions that were started on this server are lost and those users have to login again.

Wanted to confirm my understanding of how the infinispan cache should work is correct. And if so, what would cause this behavior.

Your understanding is correct.
Without seeing any configuration and logs, it’s hard to guess what’s not working properly.

You might be affected by a bug in Keycloak prior to 22 where sometimes, in very rare cases, killing of a single instance could wreak some havoc in the cluster (as far as I observed in our heaviest use deployment, it happens about 1 in 10 times or something similar).

However, if your behavior is consistent and reproducible, then for sure that is not the case. If your sessions are lost then for sure your Keycloaks do not create a cluster between themselves. So your initial assumption “If my Infinispan cache is working properly” must be false :slight_smile: You can actually set the logging level of some relevant classes in keyclaok / infinispan to DEBUG and check for yourself everything that is happening when those instances are going up and I am sure somewhere there you will find some hints that the cluster is not forming and possibly even a possible cause.

TIPS: please always specify what Keycloak version are you using and what kind of clustering setup (ex: JDBC_PING / MPING / etc) otherwise is hard to pinpoint a problem

1 Like

Yes, it is an assumption that it is working, partly to make sure my expectations about how the cluster should behave were on point before going further down the rabbit hole. In the default logging output I see the node detects other instances in the cache and what appears to be “detection and initialization of caches” for a lack of better words.

It is an older install of rh-sso, based on keycloak 15.0.2 I believe. Currently working to improve the deployment (upgrade to a newer version, containerization, using an external infinispan cache, etc.) With the current implementation losing cached sessions when keycloak instances are shutdown/rebooted for maintenance or auto scaling events is a major pain point. Near term solution will be to dig deeper into our current configuration and get it working for a quick fix, with long term solution being the move to an external infinispan cache cluster.

It is keycloak 15.0.2 and we are using jdbc_ping due to being on AWS ec2 instances and an autoscaling group behind an ALB.

What portions of the logging do I need to bump up and to what level to get a deeper look at the functionality of the clustering?

Ok, so since you are on 15 I assume you are on the wildfly version, so you need to set the debug levels using a startup script which you need to put in the wildfly. Btw, I had the same setup as you, with JDBC_PING in AWS.

So to add the startup script is a simple matter of adding something like this in your Keycloak dockerfile: COPY startup-scripts/ /opt/jboss/startup-scripts

And of course in that folder you can have as many scripts as you wish. Then in the script you can do something like this

printenv > keycloak-env.properties

${JBOSS_HOME}/bin/jboss-cli.sh --properties=keycloak-env.properties <<EOF
embed-server --server-config=standalone-ha.xml  --std-out=echo

    /subsystem=logging/logger=org.keycloak.cluster.infinispan:add()
    /subsystem=logging/logger=org.keycloak.cluster.infinispan:write-attribute(name=level, value=DEBUG)
    /subsystem=logging/logger=org.keycloak.connections.infinispan:add()
    /subsystem=logging/logger=org.keycloak.connections.infinispan:write-attribute(name=level, value=DEBUG)
    /subsystem=logging/logger=org.jgroups.protocols:add()
    /subsystem=logging/logger=org.jgroups.protocols:write-attribute(name=level, value=DEBUG)

stop-embedded-server
exit
EOF

for me was enough to understand what happens with the cache and the cluster formation and the jdbc_ping behaviour.

my only configs related to this was in the same startup script

        /system-property=jboss.tx.node.id:add(value=$HOST_ADDR)

        echo "External address added to JGROUPS"
        /subsystem=jgroups/stack=tcp/transport=TCP/property=external_addr:add(value="$HOST_ADDR")
        /subsystem=jgroups/stack=tcp/protocol=FD_SOCK/property=external_addr:add(value="$HOST_ADDR")

        # Check  https://issues.redhat.com/browse/KEYCLOAK-13310 for more info on this.
        # It should solve a part of deployment problems and make the Infinispan clusters behave nicely when members are killed
        /subsystem=jgroups/stack=tcp/transport=TCP/property=bundler_type:add(value="no-bundler")

with that external_addr set to the internal IP of the AWS instance the cluster formed correctly in my case. The last line was needed to solve a very annoying bug which technically should have been solved in keycloak 12 but anyway did not removed it. Anyway, not sure if the first line is needed or not, was there for a long time and did not bother to test without it