"Expired_Code" error in keycloak HA 26.0.1 with distributed cache enabled

Hi,
I’m running 3 nodes of Keycloak in Azure Container Apps with infinispan distributed cache in front.

Yesterday I catched the following WARN in my container’s logs and in that moment I was not able to login to web admin… it continued to log that warn every time I tried to login.

2025-02-18T13:10:59.3949172Z stdout F 2025-02-18 13:10:59,394 WARN  [org.keycloak.events] (executor-thread-10) type="LOGIN_ERROR", realmId="a1d39a9f-95b4-4fd6-9538-171ed94bead6", realmName="master", clientId="security-admin-console", userId="null", ipAddress="xx.xx.xx.xx", error="expired_code", restart_after_timeout="true"

After restarted my Azure Container App I solved. I supposed that the distributed cache could be the problem… is it possible?

My infinispan cache configuration was the following (default):

<cache-container name="keycloak">
        <transport lock-timeout="60000" stack="jdbc-ping"/>
        
        <local-cache name="realms" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
        <local-cache name="users" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
        <local-cache name="authorization" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
        <local-cache name="keys" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <expiration max-idle="3600000"/>
            <memory max-count="1000"/>
        </local-cache>
                
        <distributed-cache name="sessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>
        <distributed-cache name="authenticationSessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>
        <distributed-cache name="offlineSessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>
        <distributed-cache name="clientSessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>
        <distributed-cache name="offlineClientSessions" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>
        <distributed-cache name="loginFailures" owners="2">
            <expiration lifespan="-1"/>
        </distributed-cache>                      
        <distributed-cache name="actionTokens" owners="2">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <expiration max-idle="-1" lifespan="-1" interval="300000"/>
            <memory max-count="-1"/>
        </distributed-cache>
        
        <replicated-cache name="work">
            <expiration lifespan="-1"/>
        </replicated-cache>
        
    </cache-container>

Thinking about it, this default configuration is not good and I’ll update it, but apart that, can you help me understand the problem I had?

I got also this error few days later:

2025-02-24T07:31:47.2906574Z stdout F 2025-02-24 07:31:47,290 WARN  [org.keycloak.events] (executor-thread-30) type="CODE_TO_TOKEN_ERROR", realmId="5f626112-b788-4c16-8e15-c3be1a3910b6", realmName="TEST", clientId="TestWeb", userId="null", sessionId="cee35b2b-cc88-44cd-81cc-792708b48ec2", ipAddress="xx.xx.xx.xx", error="invalid_code", grant_type="authorization_code", code_id="cee35b2b-cc88-44cd-81cc-792708b48ec2", client_auth_method="client-secret"

I solved just restarting the 3 container instances, because I suppose they reset the distributed cache at the startup

Thanks in advance

Maybe check if Keyckoak is using sticky sessions or not. So one pod is generated the code, but another is consuming and that can cause something like this, I thin we had the similar issue, but as I said not sure, but you can check.

Hi,
Do you mean session affinity on azure?

I’m not using it.

If you mean a feature of keycloak called sticky sessions, how can I check it? Is it enabled by default?

Thanks

Check this:

Hi djordje,
Ah ok, the keycloak’s sticky session is the same thing of azure session affinity and I read that the sticky sessions are enabled by default.

Is that a problem in case of multiple KC nodes in high availability with distributed cache?

I’m not expert of infinispan and I always thought that if I have 2 or more cache nodes, they always sync data on all nodes. Am I wrong?
If the data are synched on all nodes, the sticky session could be a limit to scalability, not?

Thanks

Hi, Enrico

Once when you said like that, I think this may not be the case, back then when I faced some similar issue we weren’t use remote cache, but embedded, so that was my case.
I just wanted to point it out so you can check in details, maybe it could be useful, but as I understood also is that cache will sync between nodes…

regards!

expired code is most often a pointer that the communication (data sync/balancing) between your cluster nodes is not working properly.
How did you set up / configure clustering?
Are the ports (default: 7800 & 57800) open between the nodes?
Are there proper messages in the logs that the nodes see each other and data is being balanced?

Hi dasniko,

The ports are configured like this on Azure:

I checked the logs of all 3 containers and I saw that all seems work correctly (sync is ok, etc)

About the cluster setup you mean the env vars I setted? These are the environment vars related to the cache.

KC_CACHE=ispn
KC_CACHE_METRICS_HISTOGRAMS_ENABLED=true
KC_CACHE_CONFIG_FILE=cache-ispn.xml

The “cache-ispn.xml” content is the following (the latest version):

<jgroups>
  <stack name="jdbc-ping" extends="tcp">
    <JDBC_PING connection_driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" 
			   connection_username="xxxxxxxxxxx" 
			   connection_password="xxxxxxxxxxx"
               connection_url="jdbc:sqlserver://xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
               initialize_sql="IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='JGROUPSPING' AND xtype='U') BEGIN CREATE TABLE JGROUPSPING (own_addr VARCHAR(200) NOT NULL, cluster_name VARCHAR(200) NOT NULL, ping_data VARBINARY(MAX), CONSTRAINT PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));END;" 	 			   
               info_writer_sleep_time="500"
               remove_all_data_on_view_change="true"				   
               stack.combine="REPLACE"
               stack.position="MPING" />
  </stack>
</jgroups>
<cache-container name="keycloak">
        <transport lock-timeout="60000" stack="jdbc-ping"/>
		
        <local-cache name="realms" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
        <local-cache name="users" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
		<local-cache name="authorization" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <memory max-count="10000"/>
        </local-cache>
		<local-cache name="keys" simple-cache="true">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>
            <expiration max-idle="3600000"/>
            <memory max-count="1000"/>
        </local-cache>
				
        <distributed-cache name="sessions" owners="3">
            <expiration lifespan="43200000" max-idle="50400000"/>
        </distributed-cache>
        <distributed-cache name="authenticationSessions" owners="3">
            <expiration lifespan="120000"/>
        </distributed-cache>
        <distributed-cache name="offlineSessions" owners="3">
            <expiration lifespan="86400000"/>
        </distributed-cache>
        <distributed-cache name="clientSessions" owners="3">
            <expiration lifespan="3600000"/>
        </distributed-cache>
        <distributed-cache name="offlineClientSessions" owners="3">
            <expiration lifespan="86400000"/>
        </distributed-cache>
        <distributed-cache name="loginFailures" owners="3">
            <expiration lifespan="86400000" max-idle="3600000" interval="60000"/> 
        </distributed-cache>                      
        <distributed-cache name="actionTokens" owners="3">
            <encoding>
                <key media-type="application/x-java-object"/>
                <value media-type="application/x-java-object"/>
            </encoding>            
			<expiration lifespan="1800000" max-idle="1800000" interval="120000"/> 		
            <memory max-count="10000" when-full="REMOVE"/>
        </distributed-cache>
		
		<replicated-cache name="work">
            <expiration lifespan="86400000"/>
			<memory max-count="10000" when-full="REMOVE"/>
        </replicated-cache>
		
    </cache-container>

Thanks in advance,
Enrico

Can anyone help me?
@dasniko Do you know where I can find a working sample of Keycloak HA with infinispan for Azure? (better if azure container apps)

Thanks in advance