Experiencing Infinispan Timeouts

eloyot · August 24, 2020, 10:46pm

We have been running Keycloak on AWS as an ECS application for about 3 months now. We occasionally experience infinispan timeouts (shown below) when under load. These cause a spate of login errors until the system recovers. Our infinispan stack is also shown below. Does anyone have any thoughts as to how we might debug this. We have reviewed other posts in this forum but nothing has helped so far. Thanks!

Error:

2020-08-09 01:44:13,170 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (default task-1957) env=dev node_ip=10.116.53.26 ecs_cluster_name=keycloak-service-cluster ecs_service_name=keycloak-service Uncaught server error: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 30113426 from ip-10-116-52-251
	at org.infinispan@9.4.16.Final//org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:259)
	at org.infinispan@9.4.16.Final//org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1918)
	at org.infinispan@9.4.16.Final//org.infinispan.cache.impl.CacheImpl.putIfAbsent(CacheImpl.java:1474)
	at org.infinispan@9.4.16.Final//org.infinispan.cache.impl.DecoratedCache.putIfAbsent(DecoratedCache.java:695)

Infinispan stack configuration:

                <stack name="tcp">
                    <transport type="TCP" socket-binding="jgroups-tcp"/>
                    <protocol type="JDBC_PING">
                        <property name="datasource_jndi_name">java:jboss/datasources/KeycloakDS
                        </property>
                        <property name="remove_old_coords_on_view_change">true</property>
                        <property name="remove_all_data_on_view_change">true</property>
                        <property name="initialize_sql">
                            CREATE TABLE IF NOT EXISTS JGROUPSPING (
                            own_addr varchar(200) NOT NULL,
                            bind_addr varchar(200) NOT NULL,
                            created timestamp NOT NULL,
                            cluster_name varchar(200) NOT NULL,
                            ping_data BYTEA,
                            constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name)
                            )
                        </property>
                        <property name="insert_single_sql">INSERT INTO JGROUPSPING (own_addr, bind_addr, created, cluster_name, ping_data) values (?, '${jboss.bind.address.private:UNKNOWN}', NOW(), ?, ?)</property>
                        <property name="delete_single_sql">DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?</property>
                        <property name="select_all_pingdata_sql">SELECT ping_data, own_addr, cluster_name FROM JGROUPSPING WHERE cluster_name=?</property>
                    </protocol>
                    <protocol type="MERGE3"/>
                    <socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
                    <protocol type="FD_ALL"/>
                    <protocol type="VERIFY_SUSPECT"/>
                    <protocol type="pbcast.NAKACK2"/>
                    <protocol type="UNICAST3"/>
                    <protocol type="pbcast.STABLE"/>
                    <protocol type="pbcast.GMS"/>
                    <protocol type="MFC"/>
                    <protocol type="FRAG3"/>
                </stack>
            </stacks>

sre.ian · July 28, 2021, 4:19pm

Are you setup to use a cross-datacenter remote infinispan setup? So you have keycloak nodes and separate infinispan nodes?

The configuration you posted is jgroups TCP jDBC_PING confuguration for cluster discovery. It’s used as a way for new keycloak nodes to discover existing nodes in the cluster to sync cache from.

There’s configuration for state-transfer timeouts for caches

github.com

wildfly/wildfly/blob/master/clustering/infinispan/extension/src/main/resources/schema/jboss-as-infinispan_4_0.xsd#L739

    
      
          </xs:complexType>
          
          
<xs:complexType name="partition-handling">
              <xs:attribute name="enabled" type="xs:boolean" default="false">
                  <xs:annotation>
                      <xs:documentation>If enabled, the cache will enter degraded mode upon detecting a network partition that threatens the integrity of the cache.</xs:documentation>
                  </xs:annotation>
              </xs:attribute>
          </xs:complexType>
          
          
<xs:complexType name="state-transfer">
              <xs:attribute name="timeout" type="xs:long" default="240000">
                  <xs:annotation>
                      <xs:documentation>The maximum amount of time (ms) to wait for state from neighboring caches, before throwing an exception and aborting startup.</xs:documentation>
                  </xs:annotation>
              </xs:attribute>
              <xs:attribute name="chunk-size" type="xs:integer" default="512">
                  <xs:annotation>
                      <xs:documentation>The number of cache entries to batch in each transfer.</xs:documentation>
                  </xs:annotation>
              </xs:attribute>

<replicated-cache name="offlineSessions">
      <!---- default settings ---->
     <state-transfer timeout="240000" chunk-size="512"/>
</replicated-cache>

williamye · December 21, 2021, 7:24am

Hi Eloyot,

I got the same issue, so how did you fix it?

Topic		Replies	Views
Keycloak 8.0.1 infinispan error Getting advice	4	2604	June 7, 2023
Keycloak 14.0.0 infinispan error cause higher CPU Utilization Getting advice	0	655	January 12, 2022
Persisted Cache Sync Timeouts to Remote Infinispan Cluster on Rebuild From Keycloak Embedded Cache Getting advice upgrading	0	725	April 6, 2021
JDBC_PING timeout attempts and fail over Configuring the server	0	296	July 25, 2022
Problems running Keycloak with an external Infinispan Configuring the server	1	1561	December 1, 2022

Experiencing Infinispan Timeouts

Related Topics