Keycloak Upgrade Docker Container From 18.0.2 to 21.0.2 With +200 Realms

Hi dear community, I am reaching out to you because we need some community help on upgrading our KC version from 18.0.2 to 21.0.2. We run a multi-tenancy approach where each organisation with users is represented by a realm. Since we serve multiple organisations, we have more than 300 on our KC. I am aware of the fact, that around 300 realms, KC suffers significant performances issues, and we do have a plan to move away from this approach, and go for groups. However, we decided first to upgrade KC first.

We tried to upgrade our dev environments as well as integration. This worked seamless, and we experienced no issues. However, production is a different story. It did not work. And the only difference is, that on production we have 434 realms. I tried to reach out to OPs to give me a list of inactive businesses respectively abandoned realms, to clean and reduce the size, but we still be between 150 and 200 realms, I reckon. Because we do this from time to time, when we experience performance issues. Anyways, there is still a lot to optimize in this case. I took this over from some senior dev who set everything up. The more important issue now is upgrading while persisting the realms of course.

Further info: We run our KC as a image in a docker container configured with env vars.

Anyone who faces similar challenge? What do we need to do in order to make this migration

I can post more informations, configs and error logs.

Thanks in advance!!!

Hey Boris!

Very nice challenge you have here. Can you share what are the symptoms of the upgrade failure in production? Did you get any errors?

Are multiple docker instances trying to update the DB at the same time?

(Figuring one instance in dev, multiple in prod)

@Carl Thanks for getting back. No, we only have one KC instance running on our server. Hence, it is just this one trying to update the DB. But the issue has been solved :). Thank you very much, anyways!

@Carl @gmolaire I managed to upgrade it. I went first for 19.0 then from there to 20.0.5. Had the same errors from 19.0.3 to 20.0.5. The upgrade from 18.0.2 to 19.0.3 worked easily.

This helped me to upgrade from 19.0.2 to 20.0.5.
The solution for upgrading with a large size of realms, I found in here: Align `quarkus.transaction-manager.default-transaction-timeout` with storage lock timeouts Ā· Issue #19453 Ā· keycloak/keycloak Ā· GitHub.

He has faced a similar issue with about +600 realms.

Solution is to add this to the quarkus.properties.

quarkus.transaction-manager.default-transaction-timeout=35M

I am now trying the same from 20.0.5 to 21.0.2. Meaning, that I just change the image version and compose it up again.

2 Likes

UPDATE: Same solution worked from upgrading 20.0.5 to 21.0.2.

Config your quarkus.porperties with this variable:
quarkus.transaction-manager.default-transaction-timeout=35M

Value can be adjusted, based on the size of the realms. 35M worked seamless with +400 realms. As well as more than +600, see: Align `quarkus.transaction-manager.default-transaction-timeout` with storage lock timeouts Ā· Issue #19453 Ā· keycloak/keycloak Ā· GitHub

We would have been able to help you here, if you would have answered @gmolaire ā€˜s question which error message you were confronted with. By just writing ā€žit does not workā€œ, one is hardly able to tell you anything about possible root causes!
Being mor precise in your problem description could have lead you faster to the proper solution!

Hi @dasniko,

That would have been my next step. I did not want to clutter the problem description with a long stack trace. Itā€™s not that I didnā€™t think to provide the error message, but I was initially seeking some form of feedback on my problem, such as a question about the error or the environment.

I had indeed planned to answer @gmolaireā€™s questions. However, before I returned to this thread, I had already found the solution after spending about two hours researching various forums and documentation. Fortunately, I wasnā€™t the only one facing this issue. If I hadnā€™t found a solution, I would have provided more error messages. :slight_smile:

Regardless, you are absolutely right, and I will definitely provide the error message immediately in the future.

Best,
Boris

Hi @gmolaire,

thanks for getting back to this so quickly. I have already figure out the problem and the upgrade has been successfully completed. Nevertheless, here is the error I received:

2024-07-08 10:54:39,220 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012108: CheckedAction::check - atomic action 0:ffffac145016:9a89:668bc443:0 aborting with 1 threads active!
2024-07-08 10:54:39,223 WARN  [io.agroal.pool] (Transaction Reaper Worker 0) Datasource '<default>': JDBC resources leaked: 1 ResultSet(s) and 1 Statement(s)
2024-07-08 10:54:39,228 WARN  [org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorTrackingImpl] (Transaction Reaper Worker 0) HHH000451: Transaction afterCompletion called by a background thread; delaying afterCompletion processing until the original thread can handle it. [status=4]
2024-07-08 10:54:39,229 WARN  [com.arjuna.ats.arjuna] (Transaction Reaper Worker 0) ARJUNA012121: TransactionReaper::doCancellations worker Thread[Transaction Reaper Worker 0,5,main] successfully canceled TX 0:ffffac145016:9a89:668bc443:0
2024-07-08 10:54:39,291 WARN  [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (main) SQL Error: 0, SQLState: null
2024-07-08 10:54:39,292 ERROR [org.hibernate.engine.jdbc.spi.SqlExceptionHelper] (main) Connection is closed
2024-07-08 10:54:39,359 WARN  [com.arjuna.ats.arjuna] (main) ARJUNA012077: Abort called on already aborted atomic action 0:ffffac145016:9a89:668bc443:0
2024-07-08 10:54:39,522 INFO  [org.infinispan.CLUSTER] (main) ISPN000080: Disconnecting JGroups channel ISPN
2024-07-08 10:54:39,758 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2024-07-08 10:54:39,758 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: org.hibernate.exception.GenericJDBCException: could not prepare statement
2024-07-08 10:54:39,759 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: could not prepare statement
2024-07-08 10:54:39,759 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Connection is closed
2024-07-08 10:54:39,759 ERROR 

@borisnu can you please share how u overcome this issue ?
having something simillar when migrating the db fro 18 to 21.1.2

@Carl @gmolaire
Iā€™m getting this error stack below.
as far as you both know Is there an easier solution than going through the process from version 18 to 19?

Iā€™m moving from version 18 to 25, and it seems that I need to follow step 21.1.2 to fix the Liquibase issue mentioned here.

my stack trace:

ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) Error details:: javax.persistence.PersistenceException: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
at org.hibernate.internal.ExceptionConverterImpl.convert(ExceptionConverterImpl.java:154)
at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1626)
at org.hibernate.query.Query.getResultList(Query.java:165)
at org.keycloak.models.jpa.JpaRealmProvider.getClientRole(JpaRealmProvider.java:281)
at org.keycloak.models.jpa.JpaRealmProvider.addClientRole(JpaRealmProvider.java:253)
at org.keycloak.storage.RoleStorageManager.addClientRole(RoleStorageManager.java:205)
at org.keycloak.models.cache.infinispan.RealmCacheSession.addClientRole(RealmCacheSession.java:735)
at org.keycloak.models.cache.infinispan.RealmCacheSession.addClientRole(RealmCacheSession.java:730)
at org.keycloak.models.cache.infinispan.ClientAdapter.addRole(ClientAdapter.java:570)
at org.keycloak.migration.migrators.MigrateTo20_0_0.addViewGroupsRole(MigrateTo20_0_0.java:30)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1845)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.hibernate.query.spi.StreamDecorator.forEach(StreamDecorator.java:153)
at org.keycloak.utils.ClosingStream.forEach(ClosingStream.java:128)
at org.keycloak.migration.migrators.MigrateTo20_0_0.migrate(MigrateTo20_0_0.java:19)
at org.keycloak.storage.datastore.LegacyMigrationManager.migrate(LegacyMigrationManager.java:135)
at org.keycloak.migration.MigrationModelManager.migrate(MigrationModelManager.java:33)
at org.keycloak.quarkus.runtime.storage.legacy.database.LegacyJpaConnectionProviderFactory.migrateModel(LegacyJpaConnectionProviderFactory.java:216)
at org.keycloak.quarkus.runtime.storage.legacy.database.LegacyJpaConnectionProviderFactory.initSchema(LegacyJpaConnectionProviderFactory.java:210)
at org.keycloak.models.utils.KeycloakModelUtils.lambda$runJobInTransaction$1(KeycloakModelUtils.java:256)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransactionWithResult(KeycloakModelUtils.java:269)
at org.keycloak.models.utils.KeycloakModelUtils.runJobInTransaction(KeycloakModelUtils.java:255)
at org.keycloak.quarkus.runtime.storage.legacy.database.LegacyJpaConnectionProviderFactory.postInit(LegacyJpaConnectionProviderFactory.java:135)
at org.keycloak.quarkus.runtime.integration.QuarkusKeycloakSessionFactory.init(QuarkusKeycloakSessionFactory.java:105)
at org.keycloak.quarkus.runtime.integration.jaxrs.QuarkusKeycloakApplication.createSessionFactory(QuarkusKeycloakApplication.java:41)
at org.keycloak.services.resources.KeycloakApplication.startup(KeycloakApplication.java:125)
at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver.onStartupEvent(QuarkusLifecycleObserver.java:37)
at org.keycloak.quarkus.runtime.integration.QuarkusLifecycleObserver_Observer_onStartupEvent_b0e82415b143738dc1f986a5fa4668e83d0a5dea.notify(Unknown Source)
at io.quarkus.arc.impl.EventImpl$Notifier.notifyObservers(EventImpl.java:326)
at io.quarkus.arc.impl.EventImpl$Notifier.notify(EventImpl.java:308)
at io.quarkus.arc.impl.EventImpl.fire(EventImpl.java:76)
at io.quarkus.arc.runtime.ArcRecorder.fireLifecycleEvent(ArcRecorder.java:131)
at io.quarkus.arc.runtime.ArcRecorder.handleLifecycleEvents(ArcRecorder.java:100)
at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy_0(Unknown Source)
at io.quarkus.deployment.steps.LifecycleEventsBuildStep$startupEvent1144526294.deploy(Unknown Source)
at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)
at io.quarkus.runtime.Application.start(Application.java:101)
at io.quarkus.runtime.ApplicationLifecycleManager.run(ApplicationLifecycleManager.java:110)
at io.quarkus.runtime.Quarkus.run(Quarkus.java:70)
at org.keycloak.quarkus.runtime.KeycloakMain.start(KeycloakMain.java:98)
at org.keycloak.quarkus.runtime.cli.command.AbstractStartCommand.run(AbstractStartCommand.java:37)
at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.keycloak.quarkus.runtime.cli.Picocli.parseAndRun(Picocli.java:94)
at org.keycloak.quarkus.runtime.KeycloakMain.main(KeycloakMain.java:88)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.doRun(QuarkusEntryPoint.java:61)
at io.quarkus.bootstrap.runner.QuarkusEntryPoint.main(QuarkusEntryPoint.java:32)
Caused by: org.hibernate.HibernateException: Transaction was rolled back in a different thread!
at org.hibernate.resource.transaction.backend.jta.internal.synchronization.SynchronizationCallbackCoordinatorTrackingImpl.processAnyDelayedAfterCompletion(SynchronizationCallbackCoordinatorTrackingImpl.java:90)
at org.hibernate.internal.SessionImpl.delayedAfterCompletion(SessionImpl.java:632)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1465)
at org.hibernate.query.internal.AbstractProducedQuery.doList(AbstractProducedQuery.java:1649)
at org.hibernate.query.internal.AbstractProducedQuery.list(AbstractProducedQuery.java:1617)
ā€¦ 66 more",2024-09-17T23:43:26.753370508Z
ERROR: Failed to start server in (production) mode",2024-09-17T23:43:26.752124926Z

Hi Amir, see my post above. In this same thread.

So what did the trick for us is to edit the quarkus.properties file, which resides in the in the KC config folder. There should be two files one is the keycloak.conf and one is the quarkus.properties file. (We simply had this mounted in the compose file:
-./keycloak_config/:/opt/keycloak/conf)

Inside the quarkus.properties file we simply configured:

quarkus.transaction-manager.default-transaction-timeout=35M.

and basically we ran the container again. I recommend you to update incrementally, this is what we did from migrating 18 to 21.0.2.

I have not looked a the stack trace your are facing in detail. But having a shallow look, this might aid your problem. Iā€™d suggest you to try. But canā€™t promise that this solves your issue. This is what helped us in our particular situation. The others might give you a better answer. I am lightyears from being a KC expert.

One question though: Do you also handle a large amount of realms in your current KC setup?

Hi @borisnu,
yes, couple of hundreds realms.
so the only solution that you find is to extend the default timeout ?

as far as I understands the default timeout is 5 min and that is the reason for the stacktrace that we saw in Keycloak logsā€¦

In on of your messages you said that you upgraded from 18->19 and the upgrade went well.
did you tried to to 2 hoop upgrade ?meaning : 18->19->21?

Hi @amir4895,

so I have used the upgrading guide from Keycloak, see here Upgrading Guide. I have also hopped versions, though I did this on our staging environment. It depends on the introduced changes that are between the versions. If I can remember correctly I did jump from 18 to 20, so skipping two majors.

Yes, extending the timeout, did the job for us. If you have a staging environment or dev environment, mirroring production, you can try upgrading first there. This is what I did.

Before I found the solution on Github, I did try to configure some other values in our compose file for our Keycloak,

      - "JAVA_OPTS_APPEND=-server -Xms4096m -Xmx8192m" # overwrite the initial memory size of 64m and 512m
      - "KC_SPI_CONNECTIONS_JPA_CONNECTION_POOL_SIZE=100"
      - "KC_SPI_CONNECTIONS_JPA_IDLE_TIMEOUT=600000"  # 10 minutes
      - "KC_SPI_CONNECTIONS_JPA_MAX_LIFETIME=3600000" # 1 hour
      - "KC_SPI_DBLOCK_JPA_LOCK_WAIT_TIMEOUT=120000"
      - "QUARKUS_DATASOURCE_JDBC_IDLE_REMOVAL_INTERVAL=600000" # 10 minutes
      - "QUARKUS_DATASOURCE_JDBC_MAX_LIFETIME=3600000" # 1 hour
      - "QUARKUS_DATASOURCE_JDBC_MAX_SIZE=100"

However, I did this before touching the quarkus.properties file. And I still faced the issue, hence setting / extending the timeout did the trick for us, as already mentioned.

I canā€™t, unfortunetaly, find the stacktrace. And compare it with yours.

Note: We also have microservices that rely on the KC java library, for which we also needed to consider the changes resulting from upgrading. If you also use it, make sure to especially pay attention to the changes introduced, and keep them in sync.

I experienced what essentially was a database timeout when upgrading from keycloak 6 to keycloak 12. I was able to resolve the issue using these parameters:

-Dquarkus.transaction-manager.default-transaction-timeout=3600

-Dkeycloak.migration.batch-enabled=true

-Dkeycloak.migration.batch-size=1000

@amir4895 did you try following the suggestions from @jeffvictor ? Where you able to resolve your problem with upgrading?

Hi @borisnu whats up?
it seems that the timeout using the env ā€œquarkus.transaction-manager.default-transaction-timeout=35Mā€ did the trick however, once I finish with both hopes and in the target version 25, i see this ARJUNA012108 many error(as long with some other - ARJUNA012404,ARJUNA012095, ARJUNA012121, ARJUNA012381, ARJUNA012117 ) every few min

any idea why and is it related to bad importing of realms?>