I’ve had a pair of AWS Tasks running Keycloak 24 for a month that returned a 504 Gateway Timeout today. The Tasks coordinate in a cluster with JDBC_PING. Before I got the 504, I noticed that the browser was redirect to /admin.
The Tasks looked healthy to me. CPU and RAM were low. There’s a custom SPI that checks DB connections and that was operating ok. The database supporting KC was up too.
I was wondering if anyone knew where I might start looking. Source code line numbers would be much appreciated.
Unfortunately, no errors except the 504 that the load balancer is getting. I’m figuring it might be the distributed cache taking too long to lookup a realm. The DB supporting Keycloak looked good.
Thanks for sticking with me and my vague question.
A restart fixed everything.
There weren’t any errors in the logs. Also, (this is on AWS) there weren’t any bad health checks involving the ALB, cluster, services, or tasks besides the 504s.
I’ve been looking at org/keycloak/services/resources/admin/AdminRoot.java since the browser did redirect to /admin when I tried to this the console. I had checked the DB supporting Keycloak, but maybe the distributed cache or another component never returned from this call.
RealmModel master = new RealmManager(session)
.getKeycloakAdminstrationRealm();
I have expanded logging to include more org.keycloak messages. I’ll report back if I find anything.