Problems deploying 20.0.3 alongside our production instance

Greetings all,
I’m trying to deploy 20.0.3 alongside our production keycloak instance (older version) we’re deploying everything in kubernetes. I’ve got a seperate DB configured and all the authentication/access worked out forthat, also I’m using different ports from our production instance, however the deployment fails, and goes into CLBO.
The logs from the deployment show the following:

2023-02-13 14:26:37,418 WARN  [org.infinispan.statetransfer.InboundTransferTask] (jgroups-9,devkeycloak-66f67f6c9d-7slwg-2994) ISPN000210: Failed to request state of cache authenticationSessions from node keycloak-6944c8bf7b-fn5d4-16218, segments {0-255}: org.infinispan.commons.CacheException: ExceptionResponse(java.io.EOFException)
	at org.infinispan.statetransfer.InboundTransferTask.lambda$startTransfer$3(InboundTransferTask.java:167)
	at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
	at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
	at org.infinispan.remoting.transport.AbstractRequest.complete(AbstractRequest.java:67)
	at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:46)
	at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:51)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1496)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1398)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access$300(JGroupsTransport.java:146)
	at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1586)
	at org.jgroups.JChannel.up(JChannel.java:780)
	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:913)
	at org.jgroups.protocols.FRAG3.up(FRAG3.java:165)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:347)
	at org.jgroups.protocols.FlowControl.up(FlowControl.java:347)
	at org.jgroups.protocols.pbcast.GMS.up(GMS.java:876)
	at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:254)
	at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1048)
	at org.jgroups.protocols.UNICAST3.addMessage(UNICAST3.java:771)
	at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:752)
	at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:405)
	at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:592)
	at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:132)
	at org.jgroups.protocols.FailureDetection.up(FailureDetection.java:186)
	at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:254)
	at org.jgroups.protocols.MERGE3.up(MERGE3.java:281)
	at org.jgroups.protocols.Discovery.up(Discovery.java:300)
	at org.jgroups.protocols.TP.passMessageUp(TP.java:1400)
	at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:98)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

2023-02-13 14:29:36,458 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000080: Disconnecting JGroups channel `ISPN`
2023-02-13 14:29:36,645 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2023-02-13 14:29:36,645 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start caches
2023-02-13 14:29:36,645 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) For more details run the same command passing the '--verbose' option. Also you can use '--help' to see the details about the usage of the particular command.

Has anyone experienced this, are have I created a unicorn?

Thanks!

It sounds like incompatibility between Infinispan versions. I see things like this often when upgrading. I would recommend you isolate the networks between clusters so that you don’t get JGroups discovery and Infinispan cache syncing between clusters.

1 Like

This is a known problem. Example is: Keycloak 18.0.0 - Upgrade to 19.0.2 - ISPN Cache error · Issue #14657 · keycloak/keycloak · GitHub.

No downtime upgrades are not supported at this moment.

Workaround would be to have two infinispan clusters, but this basically defeats the purpose of keeps user sessions.

Also database is not expected to support future versions (the new version will execute a migration of the base) and you’ll have an older code with new database, which is not supported.

But, if you want to, you need to configure KC_STACK=kubernetes, label your new deployment (or statefulset) with “version: new” and create a headless service pointing to “version: new” pods and configure the kubernetes stack to use it. Check the documentation about the kubernetes cache transport: Configuring distributed caches - Keycloak

That way you’ll have a new infinispan cache cluster with only the new instances.

Careful that this solution will end user sessions.

2 Likes