Keycloak fails when its deployed, scale-in and/or scale-out

brunocascio · October 7, 2020, 7:58pm

I’m running keycloak:9.0.3 on Docker Swarm with mssql and JDBC_PING as discovery protocol. I noticed that infinispan start to fail when a container is shut down with the following exception:

Step to reproduce

Run 3 replicas (different nodes) into Docker swarm with mssql and JDBC_PING configured with:

JGROUPS_DISCOVERY_PROPERTIES=datasource_jndi_name=java:jboss/datasources/KeycloakDS,remove_all_data_on_view_change=true,info_writer_sleep_time=500

Start some script to keep making login request to Keycloak
Then, start killing containers and see the logs. The errors should be there.

: java.lang.NullPointerException
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.clearTable(JDBC_PING.java:362)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.removeAll(JDBC_PING.java:190)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.stop(JDBC_PING.java:119)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
at org.jgroups@4.1.4.Final//org.jgroups.stack.ProtocolStack.stopStack(ProtocolStack.java:906)
at org.jgroups@4.1.4.Final//org.jgroups.JChannel.stopStack(JChannel.java:1076)
at org.jgroups@4.1.4.Final//org.jgroups.JChannel._close(JChannel.java:1063)
at org.jgroups@4.1.4.Final//org.jgroups.JChannel.close(JChannel.java:454)
at org.jboss.as.clustering.jgroups@18.0.1.Final//org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.accept(ChannelServiceConfigurator.java:132)
at org.jboss.as.clustering.jgroups@18.0.1.Final//org.jboss.as.clustering.jgroups.subsystem.ChannelServiceConfigurator.accept(ChannelServiceConfigurator.java:58)
at org.wildfly.clustering.service@18.0.1.Final//org.wildfly.clustering.service.FunctionalService.stop(FunctionalService.java:77)
at org.wildfly.clustering.service@18.0.1.Final//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$stop$1(AsyncServiceConfigurator.java:142)
at org.jboss.threads@2.3.3.Final//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads@2.3.3.Final//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.jboss.threads@2.3.3.Final//org.jboss.threads.JBossThread.run(JBossThread.java:485)

Also, in some cases there are a lot of errors like this (I think they’re related):

 Error executing command PutKeyValueCommand on Cache 'authenticationSessions', writing keys [5efd900f-05b2-4aac-9c6c-f9cc74490f01]
Error executing command PutKeyValueCommand on Cache 'clientSessions', writing keys [ca20afb1-53bb-4d59-8914-12b6c1962c15]
... similar messages ...

Those errors make my system unstable and clients start to receiving a lot of errors (400 and 500)

brunocascio · October 8, 2020, 8:42pm

Another related exception

: java.sql.SQLException: javax.resource.ResourceException: IJ000470: You are trying to use a connection factory that has been shut down: java:jboss/datasources/KeycloakDS
at org.jboss.ironjacamar.jdbcadapters@1.4.17.Final//org.jboss.jca.adapters.jdbc.WrapperDataSource.getConnection(WrapperDataSource.java:159)
at org.jboss.as.connector@18.0.1.Final//org.jboss.as.connector.subsystems.datasources.WildFlyDataSource.getConnection(WildFlyDataSource.java:64)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.getConnection(JDBC_PING.java:310)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.clearTable(JDBC_PING.java:361)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.removeAll(JDBC_PING.java:190)
at org.jgroups@4.1.4.Final//org.jgroups.protocols.JDBC_PING.stop(JDBC_PING.java:119)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
.....

ArmanGhost · October 9, 2020, 5:08am

Hi
If it is helpful we are using KUBE_PING with Kubernetes cluster. Our DevOps engineer create service account with RBAC and pods saw each other

slaskawi · October 9, 2020, 9:42am

We finally managed to merge one of my fixes for JDBC_PING: https://github.com/keycloak/keycloak-containers/pull/255

It should get better.

brunocascio · October 9, 2020, 12:41pm

Hey, thanks for your reply!

Unfortunately, Kubernetes is not an option since the company uses Docker Swarm

brunocascio · October 9, 2020, 12:46pm

Hey,

Thanks for your suggestion, I’ll give it a try.

Regarding 15c5b97
Basically, you removed PING and MPING protocols from configuration. Is it ok? Since I’m using 9.0.3 I’ll create a custom start up script.

slaskawi · October 9, 2020, 1:29pm

Yeah, sure. You need to have (at least) one discovery protocol in your stack. MPING/PING/JDBC_PING are all different implementation of the discovery protocol.

brunocascio · October 12, 2020, 2:02pm

Hi @slaskawi,

After some tests, errors deleting ping data disappear, but downtime is still a concern with 500 errors.

Is there a way to avoid this kind of errors?

I’m thinking about changing the discovery timeout to a lower value, configure the cache to be async?

Any help would be great

brunocascio · October 13, 2020, 9:30pm

It seems to be working with Keycloak 11.0.2 + JDBC_PATCH

I’ll keep you posted after a while

Thanks!

Topic		Replies	Views
Keycloak 17 cluster running on docker swarm Configuring the server clustering , container	2	4417	August 17, 2022
Keycloak Randomly Crashing in Docker Swarm with SocketTimeoutException and Blocked Threads in Infinispan Operations Getting advice	0	116	June 19, 2024
Adventures with Docker Swarm and Keycloak Clustering Configuring the server clustering , container	13	8882	January 17, 2024
Failed to request state of cache authenticationSessions from node - Cache Manager is stopping:	0	931	November 7, 2022
JDBC_PING stack for keycloak-19 quarkus	3	1784	October 27, 2022

Keycloak fails when its deployed, scale-in and/or scale-out

Related topics