PersistenceException / HibernateException: Transaction rolled back in a different thread

dear all,
in order to figure out if Keycloak can handle the load we foresee at some point I’ve been doing a stress test with a single local instance (v.10.0.1). For database I use Postgresql 12, JRE is 11.0.6, Mac OS 10.13.6.

Our first tests with Keycloak were with Thomas Darimont’s Spring-Boot packaged keycloak v.4.8.3 (because we want to deploy KC on IMB Cloud), and there we found that KC had a bit of trouble handling ~2500 clients: when, after startup, you first try to list the Clients in the GUI, it freezes for several minutes. After that, it worked fine.

I found that it’s not that hard to deploy the Keycloak Docker image directly on Cloud Foundry, which also made it a lot easier to upgrade to a more recent version of Keycloak. I found that if I migrated a v.4.8.3 version to v.10.0.1, it would not start up any longer, so I started with a much smaller set of clients, and it worked OK.

With v.10 up and running, I then added 10K users and 10K clients with JMeter, so see how it would hold out. Adding all that to Keycloak went pretty smoothly, in the order of about 2000 / minute.
And indeed, the “list clients” freezing seemed to be fully resolved in v.10; and listing that many users was also immediate.

But … then I tried adding one more client and one user, manually via the Keycloak UI. The client was no problem at all; but adding the user does not work (any more).

Configuration is like this

  • standalone.xml
  • defined the datasource according to the setup guide:
standalone.xml: datasource config
<datasource jndi-name="java:jboss/datasources/KeycloakDS" 
pool-name="KeycloakDS" enabled="true" use-java-context="true" 
statistics-enabled="${wildfly.datasources.statistics-enabled:${wildfly.statistics-enabled:false}}">
   <connection-url>jdbc:postgresql://localhost:5432/<DB>?currentSchema=keycloakprod-v10</connection-url>
  • added Postgres driver to modules, added the config to standalone.xml
  • changed the ExampleDS in default-bindings to KeycloakDS, and removed the ExampleDS references

We also added our own BCrypt hashing implementation as a .jar deployment, plus whatever that needed in libraries (commons-coded, spring-security-crypty) as two other modules (which worked fine)

When I add another user in the Keycloak UI, the interface freezes. For about 5 minutes, nothing happens at all, and the operation fails (stack trace abbreviated for readability):

warn & error stacktrace
WARN  [com.arjuna.ats.arjuna] (Transaction Reaper) ARJUNA012117: TransactionReaper::check timeout for TX 0:ffff7f000001:-67f70a68:5ee0fb65:c0 in state  RUN
(...)
ERROR [org.keycloak.services.error.KeycloakErrorHandler] (default task-1) Uncaught server error: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
at org.hibernate@5.3.15.Final//org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
at org.hibernate@5.3.15.Final//org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1515)
at org.hibernate@5.3.15.Final//org.hibernate.query.Query.getResultList(Query.java:132)
at org.keycloak.keycloak-model-jpa@10.0.1//org.keycloak.models.jpa.ClientAdapter.getClientScopes(ClientAdapter.java:381)'
(...)
Caused by: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
at org.hibernate@5.3.15.Final//org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorTrackingImpl.processAnyDelayedAfterCompletion(SynchronizationCallbackCoordinatorTrackingImpl.java:90)
at org.hibernate@5.3.15.Final//org.hibernate.internal.SessionImpl.delayedAfterCompletion(SessionImpl.java:658)
(...)
[com.arjuna.ats.arjuna] (default task-1) ARJUNA012077: Abort called on already aborted atomic action 0:ffff7f000001:-67f70a68:5ee0fb65:c0

… no user was created (I also checked in the user_entity table directly).

I searched for a solution and found some hints suggesting that this might be a configuration issue. One was a recommendation to set the timeout (default 300 sec.) to a longer value - though I thought that unlikely in this case because: creating those 10000 initial users in Jmeter took maybe 20 milliseconds per user, so that you’d think that 300 seconds should be more than enough to create another one.

Still, I tried adding it this section of standalone.xml, like this:

standalone.xml: subsystem xmlns="urn:jboss:domain:transactions:5.0
<subsystem xmlns="urn:jboss:domain:transactions:5.0">
<core-environment node-identifier="${jboss.tx.node.id:1}">
    <process-id>
        <uuid/>
    </process-id>
    </core-environment>
    <recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>
    <coordinator-environment default-timeout="600" <--- there

though the only result was that I had to wait twice as long for adding a user to fail with the above error.

Another suggestion was to add an idle timeout to the datasource, like this:

<timeout>
<idle-timeout-minutes>1</idle-timeout-minutes>
</timeout>

and yet another one was:

<validation>
<check-valid-connection-sql>select 1</check-valid-connection-sql>
<background-validation>true</background-validation>
<background-validation-millis>15000</background-validation-millis>
</validation>

… neither of which made a difference.
Then I found one other suggestion to add jta=false to the <datasource, like this:

<datasource jta="false" jndi-name="java:jboss/datasources/KeycloakDS" ...

which had a most curious effect: this time, a formidable spike could be seen in the connections monitor of the database, which tapered off asymptotically (like the function of 1/x does). Looking in the database I could see that now, indeed the user was created: though the email address I had filled in appeared as the username, and both the first_name and last_name were left empty.

That was pretty much the point where I decided that I better ask here on the forum :slight_smile:

So, if anyone could point me towards something I haven’t yet tried, that would be great!

Thanks,
Lúthien

Hi @Luthien

Did you fix the problem though ?