Keycloak Randomly Crashing in Docker Swarm with SocketTimeoutException and Blocked Threads in Infinispan Operations

Hello Keycloak Community,

I am experiencing an issue with my Keycloak setup deployed in a Docker Swarm environment. The Keycloak instances are randomly crashing, and the logs indicate issues with Infinispan operations, specifically SocketTimeoutException and blocked threads.

Environment Details:

  • Keycloak Version: [Provide your Keycloak version]
  • Deployment: Docker Swarm
  • Database: [Specify your database, e.g., PostgreSQL, MySQL]
  • Operating System: [Specify the OS, e.g., Ubuntu 20.04]

Issue Description:

Keycloak crashes randomly with the following stack traces:

Stack Trace:

2024-06-19 12:00:51,712 WARN  [org.infinispan.HOTROD] (Thread-0) ISPN004098: Closing connection [id: 0xc10ae209, L:/10.0.3.139:39214 - R:10.0.3.110/10.0.3.110:11222] due to transport error: java.net.SocketTimeoutException: ReplaceIfUnmodifiedOperation{offlineSessions, key=[B0x033E2466653061613936352D65376465..[39], value=[B0x03040B000000446F72672E6B6579636C..[1141], flags=0, connection=10.0.3.110/10.0.3.110:11222} timed out after 60000 ms
    at org.infinispan.client.hotrod.impl.operations.HotRodOperation.run(HotRodOperation.java:182)
    at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
    at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469)
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

2024-06-19 12:00:55,227 WARN  [io.vertx.core.impl.BlockedThreadChecker] (vertx-blocked-thread-checker) Thread Thread[vert.x-eventloop-thread-5,5,main] has been blocked for 3548 ms, time limit is 2000 ms: io.vertx.core.VertxException: Thread blocked
    at io.vertx.core.net.impl.ConnectionBase.lambda$handleException$4(ConnectionBase.java:357)
    at io.vertx.core.net.impl.ConnectionBase$$Lambda$1733/0x0000000841090840.handle(Unknown Source)
    at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:50)
    at io.vertx.core.impl.ContextImpl.emit(ContextImpl.java:274)
    at io.vertx.core.impl.EventLoopContext.emit(EventLoopContext.java:22)
    at io.vertx.core.net.impl.ConnectionBase.handleException(ConnectionBase.java:354)
    at io.vertx.core.http.impl.Http1xServerConnection.handleException(Http1xServerConnection.java:466)
    at io.vertx.core.net.impl.VertxHandler.exceptionCaught(VertxHandler.java:136)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
    at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:273)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.exceptionCaught(DefaultChannelPipeline.java:1377)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302)
    at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:281)
    at io.netty.channel.DefaultChannelPipeline.fireExceptionCaught(DefaultChannelPipeline.java:907)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.handleReadException(AbstractNioByteChannel.java:125)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:177)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

Despite these changes, the issue persists.

Additional Information:

  • The issue appears to occur randomly under load.
  • Network connectivity between Keycloak and Infinispan nodes has been verified as stable.
  • Resource allocation for both Keycloak and Infinispan nodes appears sufficient based on current monitoring tools.

Request for Assistance:

I would appreciate any guidance on resolving these issues. Specifically, I’m looking for recommendations on further configuration adjustments or insights into potential underlying causes.

Thank you in advance for your assistance!