Stopgap remediation of KEYCLOAK-13340 (Perf issues with many offline sessions) in v16.1.1?


My company is currently on Keycloak v16.1.1, and have been hit (we think) by KEYCLOAK-13340: Performance Issues with many offline sessions, which in extreme cases is causing servers to fail liveness checks and be restarted, and seems to be a the root of user logouts. This behavior seems to have been introduced with lazy offline session loading in v15.

A fix for this has been merged into main at b104dc7, and has not been merged into a release branch. As such, I assume that it’s considered unstable for now; nevertheless, barring the lack of any alternatives, it’s not feasible for for us to migrate to v20 at this time.

I have a few followup questions:

  1. We have a relatively small number of accounts with large numbers of offline sessions. Why this is the case is still a matter of internal investigation, but given that, is there a recommended database-level method for manually cleaning up sessions? The theory here is that even if there are adverse effects to a small group of users, they would be able to re-login as necessary.
  2. Mostly a mirror of the GH issue discussion comment: While my search has been non-exhaustive and mostly centered upon changes to and specifically the query findClientSessionsOrderedById, I haven’t found changes associated with this issue between 16.1.1 and the proposed fix mentioned above. I’ve created a patch for 16.1.1 based upon the original fix, but have yet to attempt to build it. Regardless, how horrible of an idea is this?
  3. Are there any other short-term remediations that come to mind?