Keycloak operator created cluster not working with mutliple keycloak pods

Somehow we have keycloak cluster managed with the keycloak-operator which ended up in a state where setting the instance count > 1 causes the login flows to break. The login flow seems to get into an infinite loop of failed logins with first this error message:

21:07:36,645 WARN [org.keycloak.events] (default task-1) type=CODE_TO_TOKEN_ERROR, realmId=master, clientId=security-admin-console, userId=null, ipAddress=10.240.0.5, error=invalid_code, grant_type=authorization_code, code_id=23594b9a-f98f-4ad6-b8b2-437838708843, client_auth_method=client-secret

And then on a second attempted login, an infinite loop resulting in this error over and over:

21:08:17,217 WARN [org.keycloak.events] (default task-11) type=LOGIN_ERROR, realmId=master, clientId=null, userId=null, ipAddress=10.240.0.5, error=expired_code, restart_after_timeout=true, authSessionParentId=f8d19861-6fb7-4425-88df-c183caaa2b11, authSessionTabId=Z-9DszaYy4A

With a single instance, everything works fine.

With multiple instances, going through the ingress does not work, but going through a port-forward to the service end point works fine.

I suspect the service port-forward may work because the ingress load balances requests, but the service does not.

Setting session affinity with the nginx ingress fixes the issue, but I suspect that’s just a band-aid on some broken functionality. I suspect the AuthenticationSessions replication is not working correctly, but I don’t see any indication that it’s failing, and I’m not sure where to look or how to confirm it’s the root issue.

Here are the nginx affinity settings I used on the ingress:

nginx.ingress.kubernetes.io/affinity: cookie
nginx.ingress.kubernetes.io/affinity-mode: persistent

We have another keycloak cluster that is configured almost completely identically, and it’s working fine.

I’m at a complete loss as to what’s going on here though - this cluster was previously working fine, and it only broke when we attempted to update a theme we’re using with the cluster. We’ve tried restarting all the keycloak pods, but the problem still persists.

If anyone could provide some guidance on how to figure out what is going on, it would be much appreciated, thank you!