I’m running Keycloak clustered with a PostgreSQL JDBC driver (both DB and Keycloak VM are running on Azure).
After a couple days of uptime, Keycloak will eventually stop working. According to the log file, each time the originating problem is an I/O error during a scheduled task (“ClearExpiredEvents”):
2020-03-27 23:27:44,272 ERROR [org.keycloak.services] (Timer-2) KC-SERVICES0089: Failed to run scheduled task ClearExpiredEvents: javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: could not extract ResultSet
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1515)
at org.hibernate.query.Query.getResultList(Query.java:132)
at org.keycloak.models.jpa.JpaRealmProvider.getRealms(JpaRealmProvider.java:117)
at org.keycloak.models.jpa.JpaRealmProvider.getRealms(JpaRealmProvider.java:113)
at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealms(RealmCacheSession.java:466)
at org.keycloak.services.scheduled.ClearExpiredEvents.run(ClearExpiredEvents.java:34)
[ ... ]
Caused by the underlying exception:
Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:358)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:159)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:109)
at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:504)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:60)
... 28 more
Caused by: java.net.SocketException: Connection reset by peer: socket write error
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
[ ... ]
Now one error like this wouldn’t be a problem. After all, connection to the database might break once in a while… But unfortunately, Keycloak seems to stop working after this. Consecutive error messages might look like this (here another scheduled task “ClearExpiredUserSessions”):
> 2020-03-27 23:27:45,336 WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-2) SQL Error: 0, SQLState: 08003
> 2020-03-27 23:27:45,336 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-2) This connection has been closed.
> 2020-03-27 23:27:45,336 ERROR [org.keycloak.services] (Timer-2) KC-SERVICES0089: Failed to run scheduled task ClearExpiredUserSessions: javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: could not prepare statement
> at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
> at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1515)
> at org.hibernate.query.Query.getResultList(Query.java:132)
> at org.keycloak.models.jpa.JpaRealmProvider.getRealms(JpaRealmProvider.java:117)
> at org.keycloak.models.jpa.JpaRealmProvider.getRealms(JpaRealmProvider.java:113)
> at org.keycloak.models.cache.infinispan.RealmCacheSession.getRealms(RealmCacheSession.java:466)
> at org.keycloak.services.scheduled.ClearExpiredUserSessions.run(ClearExpiredUserSessions.java:36)
> [ ... ]
It seems to me that some part of Keycloak should re-open the connection? Now every database access just seems to fail…?
As I mentioned, the Keycloak VM as well as the database were running on Azure. Our current workaround is to run Keycloak against a MSSQL database instead of a PostgreSQL. With MSSQL, the problem has not happened again so far.
So unfortunately, from my point of view, the issue in Keycloak is not resolved at all. If a connection loss happens, it’s possible that Keycloak does not react correctly in my opinion. This should still be addressed in order to build a reliable system.
Update: As suspected, the above mentioned workaround did not last forever. Keycloak was running for about 2 weeks without issues, until the SQL Server connection was disrupted once again.
Then it once again did not work until manually restarted.
Hibernate seems to recognize that the connection is closed:
WARN [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-2) SQL Error: 0, SQLState: null
ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (Timer-2) The connection is closed.
But instead of reopening the connection, all future SQL statements fail.
Thank you for your inputs. I appreciate you helping me out.
I had not included the validation block that you posted. I have added that now and will have to see how stability develops.
Same problem happened to us following a network switch restart. Keycloak was stuck in this state until I restarted the instance. Thanks for this thread and the information. I have updated the configuration as suggested.