Persisted Cache Sync Timeouts to Remote Infinispan Cluster on Rebuild From Keycloak Embedded Cache

I’m working on switching keycloak (v.3.4.0 final) from using embedded infinispan to a dedicated remote infinispan cluster (v8.2.8.final). I’ve gone through the upgrade process to use infinispan cluster as remote-store in lower environments without issues. In my production setting I am running into a timeout exception on InfinispanCacheInitializer

Where error is happening on Keycloak: https://github.com/keycloak/keycloak/blob/3.4.2.Final/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/initializer/InfinispanCacheInitializer.java#L117

ERROR [org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer] (ServerService Thread Pool -- 54) ExecutionException when computed future. Errors: 13: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutExc
eption                                                                                                                                                                                                                                                                      
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:850)                                                                                                                                                          
        at org.keycloak.models.sessions.infinispan.initializer.InfinispanCacheInitializer.startLoading(InfinispanCacheInitializer.java:102)                                                                                                                                 
        at org.keycloak.models.sessions.infinispan.initializer.DBLockBasedCacheInitializer.startLoading(DBLockBasedCacheInitializer.java:75)                                                                                                                                
        at org.keycloak.models.sessions.infinispan.initializer.CacheInitializer.loadSessions(CacheInitializer.java:41)                                                                                                                                                      
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$2.run(InfinispanUserSessionProviderFactory.java:150)                                                                                                                                
        at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)                                                                                                                                                                    
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory.loadPersistentSessions(InfinispanUserSessionProviderFactory.java:137)                                                                                                               
        at org.keycloak.models.sessions.infinispan.InfinispanUserSessionProviderFactory$1.onEvent(InfinispanUserSessionProviderFactory.java:108)                                                                                                                            
        at org.keycloak.services.DefaultKeycloakSessionFactory.publish(DefaultKeycloakSessionFactory.java:68)                                                                                                                                                               
        at org.keycloak.services.resources.KeycloakApplication$2.run(KeycloakApplication.java:165)                                                                                                                                                                          
        at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:227)                                                                                                                                                                    
        at org.keycloak.services.resources.KeycloakApplication.<init>(KeycloakApplication.java:158)                                                                                                                                                                         
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)                                                                                                                                                                                            
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)                                                                                                                                                                     
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)                                                                                                                                                             
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)                                                                                                                                                                                                  
        at org.jboss.resteasy.core.ConstructorInjectorImpl.construct(ConstructorInjectorImpl.java:150)                                                                                                                                                                      
        at org.jboss.resteasy.spi.ResteasyProviderFactory.createProviderInstance(ResteasyProviderFactory.java:2298)                                                                                                                                                         
        at org.jboss.resteasy.spi.ResteasyDeployment.createApplication(ResteasyDeployment.java:340)                                                                                                                                                                         
        at org.jboss.resteasy.spi.ResteasyDeployment.start(ResteasyDeployment.java:253)                                                                                                                                                                                     
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.init(ServletContainerDispatcher.java:120)                                                                                                                                                   
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.init(HttpServletDispatcher.java:36)                                                                                                                                                              
        at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:117)                                                                                                                                                           
        at org.wildfly.extension.undertow.security.RunAsLifecycleInterceptor.init(RunAsLifecycleInterceptor.java:78)                                                                                                                                                        
        at io.undertow.servlet.core.LifecyleInterceptorInvocation.proceed(LifecyleInterceptorInvocation.java:103)                                                                                                                                                           
        at io.undertow.servlet.core.ManagedServlet$DefaultInstanceStrategy.start(ManagedServlet.java:250)                                                                                                                                                                   
        at io.undertow.servlet.core.ManagedServlet.createServlet(ManagedServlet.java:133)                                                                                                                                                                                   
        at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:565)                                                                                                                                                                            
        at io.undertow.servlet.core.DeploymentManagerImpl$2.call(DeploymentManagerImpl.java:536)                                                                                                                                                                            
        at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:42)                                                                                                                                           
        at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)                                                                                                                                                             
        at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105)                                                                                                                              
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508)                                                                                                       
        at io.undertow.servlet.core.DeploymentManagerImpl.start(DeploymentManagerImpl.java:578)                                                                                                                                                                             
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentService.startContext(UndertowDeploymentService.java:100)                                                                                                                                             
        at org.wildfly.extension.undertow.deployment.UndertowDeploymentService$1.run(UndertowDeploymentService.java:81)                                                                                                                                                     
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                                                                                                                                                                                          
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                                                                                                                                                                         
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                                                                                                                                                                  
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                                                                                                                  
        at java.lang.Thread.run(Thread.java:748)                                                                                                                                                                                                                            
        at org.jboss.threads.JBossThread.run(JBossThread.java:320)                                                                                                                                                                                                          
Caused by: java.util.concurrent.TimeoutException                                                                                                                                                                                                                            
        at java.util.concurrent.FutureTask.get(FutureTask.java:205)                                                                                                                                                                                                         
        at org.infinispan.commons.util.concurrent.NotifyingFutureImpl.get(NotifyingFutureImpl.java:88)                                                                                                                                                                      
        at org.infinispan.distexec.DefaultExecutorService$LocalDistributedTaskPart.getResult(DefaultExecutorService.java:1083)                                                                                                                                              
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.innerGet(DefaultExecutorService.java:868)                                                                                                                                                     
        at org.infinispan.distexec.DefaultExecutorService$DistributedTaskPart.get(DefaultExecutorService.java:848)                                                                                                                                                          
        ... 44 more

Overview

  • Keycloak version: 3.4.0.final (aware this is an older version - and in use with custom implementation - not easy to upgrade)
    • startup script: ExecStart={{ keycloak_jboss_home }}/bin/standalone.sh -b using standalone.xml
  • Infinispan version: 8.2.8.final
  • Switching from embedded local cache to remote store configuration for the following caches:
    • users (distributed)
    • sessions (replicated)
    • authenticationSessions (replicated)
    • offlineSessions (replicated)
    • loginFailures (replicated)
    • authorization (replicated)
  • offline_user_session count: ~3 million

In an attempt to test out this sized cache sync from database, I’ve updated the configuration for timeouts in standalone.xml and standalone.conf files

keycloak/keycloak-3.4.0.Final/standalone/configuration/standalone.xml Updated coordinator timeout to be 3 hours and commented out query-timeout

<subsystem xmlns="urn:jboss:domain:transactions:4.0">                                                                                                                                                                                             
    <core-environment>                                                                                                                                                                                                                            
        <process-id>                                                                                                                                                                                                                              
            <uuid/>                                                                                                                                                                                                                               
         </process-id>                                                                                                                                                                                                                             
    </core-environment>                                                                                                                                                                                                                           
    <recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>                                                                                                                                  
    <object-store path="tx-object-store" relative-to="jboss.server.data.dir"/>                                                                                                                                                                    
    <coordinator-environment default-timeout="10800"/>                                                                                                                                                                                            
 </subsystem> 

....
....
..

<!-- <timeout>                                                                                                                                                                                                                        
    <query-timeout>15</query-timeout>                                                                                                                                                                                                 
</timeout>-->   

/keycloak/keycloak-3.4.0.Final/bin/standalone.conf add blocking timeout to JAVA_OPTS

JAVA_OPTS="$JAVA_OPTS -Djboss.modules.system.pkgs=$JBOSS_MODULES_SYSTEM_PKGS -Djava.awt.headless=true -Djboss.as.management.blocking.timeout=10800"  

I’d like to note that after the first few attempts at getting this to work, when reverting back to using the embedded cache on the keycloak nodes, the data synced fine without any timeout errors in around 1.5 hours.

After startup of keycloak it takes about 60 minutes before it starts to sync offline user sessions. Taking a look at the queries being run by keycloak, I can see that the timeout errors occur about 5-10 minutes after it starts syncing offline_user_session records to offlineSessions cache

The queries being run before it times out are:

  1. delete from OFFLINE_CLIENT_SESSION where not (exists (select persistent1_.USER_SESSION_ID from OFFLINE_USER_SESSION persistent1_ where persistent1_.USER_SESSION_ID=OFFLINE_CLIENT_SESSION.USER_SESSION_ID))
  2. update OFFLINE_USER_SESSION set LAST_SESSION_REFRESH=$1
  3. DELETE FROM JGROUPSPING WHERE own_addr=$1 AND cluster_name=$2
  4. select count(persistent0_.OFFLINE_FLAG) as col_0_0_ from OFFLINE_USER_SESSION persistent0_ where persistent0_.OFFLINE_FLAG=$1
  5. select userrolema0_.ROLE_ID as col_0_0_ from USER_ROLE_MAPPING userrolema0_ where userrolema0_.USER_ID=$1
  6. select userentity0_.ID as ID1_76_, userentity0_.CREATED_TIMESTAMP as CREATED_2_76_, userentity0_.EMAIL as EMAIL3_76_, userentity0_.EMAIL_CONSTRAINT as EMAIL_CO4_76_, userentity0_.EMAIL_VERIFIED as EMAIL_VE5_76_, userentity0_.ENABLED as ENABLED6_76_, userentity0_.FEDERATION_LINK as FEDERATI7_76_, userentity0_.FIRST_NAME as FIRST_NA8_76_, userentity0_.LAST_NAME as LAST_NAM9_76_, userentity0_.NOT_BEFORE as NOT_BEF10_76_, userentity0_.REALM_ID as REALM_I11_76_, userentity0_.SERVICE_ACCOUNT_CLIENT_LI
  7. select attributes0_.USER_ID as USER_ID4_72_0_, attributes0_.ID as ID1_72_0_, attributes0_.ID as ID1_72_1_, attributes0_.NAME as NAME2_72_1_, attributes0_.USER_ID as USER_ID4_72_1_, attributes0_.VALUE as VALUE3_72_1_ from USER_ATTRIBUTE attributes0_ where attributes0_.USER_ID=$1

I setup Infinispan WebConsole UI so I can see the progress of the cache sync. Every time it gets 15k entries in (out of the ~3 million)

I am not positive the issue here since the syncing of offline sessions from database works fine for the embedded version, but for the remote infinispan startup it seems to have an issue either with the batching of the queries or there’s another configuration I am missing either on keycloak of infinispan side.