Keycloak in production

Hello - I would like some suggestions for a large-scale deployment of keycloak in production with a million users with 177 distinct realms which my customer calls a site segregated from each other with specific urls for keycloak. As far as I have understood, the most scalable option to deploy this is using the openshift container platform on AWS clusters. My questions are as follows

  1. How many clusters do I need for this project?
  2. Is it safe to assume 1 cluster = 1 site,
  3. A classic openshift on aws is priced per cluster with 9 worker nodes, can this cluster be shared across multiple sites? If yes, how many sites can i distinctly run on a 9 worker node cluster of Openshift containerized platforms on AWS?

Any guidance on this so I can get the correct pricing details is appreciated. I am on a tight budget so would love the most economic but stable architecture as a solution.

Just curious if for any advice you would receive on this topic, you will just use it as the proposal or you will do a POC? I really don’t want to sound condescending as that is not my intention, if anything I would really love to read a detailed analysis of this problem, to see how Keycloak would handle such a deployment, and so on.

However, I really think that whatever answer you will receive, it will not be enough to base your architecture proposal without actually building either a scaled down version (but then how will you know it will handle the expected load) or actually try to do a POC with a Keycloak actually running on that cluster you are interested about. Especially if you are on tight budget. Because if someone says: yeah just choose that cluster but it is way too big and you could have done with half? Or if someone says, naaah will not be enough, take at least 3 clusters, then what you will do?

I would suggest to start reading this: Keycloak Benchmark :: Keycloak Benchmark and actually download that code, understand what it does and how it works, and try to run the benchmark yourself. This will give you a lot more confidence to find the right architecture. As once you do have the benchmark up and running, you can tweak it to suit your use cases (x million users, x hundres realms, x operations happening, etc).

And I would love to hear back what happened and what was the results, either as a direct reply here or a blog post once the architecture is ready and proved that it can handle the expecteded load

1 Like