Keycloak Cluster JGroups/Infinispan on EC2

Hello, we are running keycloak on EC2 VMs and want to run in currently a cluster of 2.

We are installing Keycloak 19, and attempting to setup clustering using S3.

I have in my keycloak.conf file the following entries in relation to the cache:

cache=ispn
cache-config-file=cache-ec2.xml
cache-stack=ec2

But I’m struggling to configure the cache-ec2.xml file. This is the current contents for that:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd"
    xmlns="urn:infinispan:config:11.0">

  <!-- custom stack goes into the jgroups element -->
  <jgroups>
	<stack name="s3"> 
		<transport type="TCP" socket-binding="jgroups-tcp"/> 
		<protocol type="S3_PING" location="pauls-test-keycloak-bucket" /> 
		<protocol type="MERGE3"/> 
		<protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/> 
		<protocol type="FD_ALL"/> 
		<protocol type="VERIFY_SUSPECT"/> 
		<protocol type="pbcast.NAKACK2" use_mcas_xmit="false" /> 
		<protocol type="UNICAST3"/> 
		<protocol type="pbcast.STABLE"/> 
		<protocol type="pbcast.GMS"/> 
		<protocol type="MFC"/> 
		<protocol type="FRAG2"/> 
	</stack> 
  </jgroups>

  <cache-container name="keycloak">
    <!-- custom stack must be referenced by name in the stack attribute of the transport element -->
    <transport lock-timeout="60000" stack="s3"/>
    <local-cache name="realms">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <local-cache name="users">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <distributed-cache name="sessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="authenticationSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="clientSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineClientSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="loginFailures" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <local-cache name="authorization">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <replicated-cache name="work">
      <expiration lifespan="-1"/>
    </replicated-cache>
    <local-cache name="keys">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="3600000"/>
      <memory max-count="1000"/>
    </local-cache>
    <distributed-cache name="actionTokens" owners="2">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="-1" lifespan="-1" interval="300000"/>
      <memory max-count="-1"/>
    </distributed-cache>
  </cache-container>
</infinispan>

I’m sure I have something configured incorrectly there, but I get the following error:

2022-11-01 17:16:28,764 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start server in (production) mode
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Failed to start caches
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: org.infinispan.manager.EmbeddedCacheManagerStartupException: org.infinispan.commons.CacheConfigurationException: ISPN000085: Error while trying to create a channel using the specified configuration file: default-configs/default-jgroups-ec2.xml
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: org.infinispan.commons.CacheConfigurationException: ISPN000085: Error while trying to create a channel using the specified configuration file: default-configs/default-jgroups-ec2.xml
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: ISPN000085: Error while trying to create a channel using the specified configuration file: default-configs/default-jgroups-ec2.xml
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: JGRP000002: unable to load protocol org.jgroups.aws.s3.NATIVE_S3_PING (either with relative - org.jgroups.aws.s3.NATIVE_S3_PING - or absolute - org.jgroups.protocols.org.jgroups.aws.s3.NATIVE_S3_PING - class name)
2022-11-01 17:16:28,765 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) For more details run the same command passing the '--verbose' option. Also you can use '--help' to see the details about the usage of the particular command.

I’m struggling to get this working and have spent all day today trying to get to the bottom of it, has anyone got working configuration they are willing to share or point me in the right direction?

Any help is appreciated thanks.
Paul

I don’t know how to configure the EC2 stack, BUT…
It has been said quite multiple times here in the forum in various threads about cluster configuration, that, if you configure cache-stack AND cache-config-file, the former will take higher precedence over the latter and your custom file will be ignored. So, if you want to use one of the built-in stacks, use cache-stack, if you want to use a custom stack with a xml file, specify it with cache-config-file.

How does that work with S3 where you need to specify the bucket?
Or do you specify it with ENV variables?

Would be good if there was a keycloak example xml file for each of the infinispan implementations, as there is obviously some wrapping around the default infinispan xml - at least from comparing JDBC_PING, which didn’t work for me in AWS as has been reported as well.

Maybe this helps (and maybe not only the linked chapter, but other resources of the Infinispan docs):

https://infinispan.org/docs/dev/titles/embedding/embedding.html#jgroups-extras-properties_cluster-transport

Based on this and this you need to select the ec2 stack and provide AWS credentials for the underlying library used by it.

I suppose you can just put those lines in your keycloak.conf (besides removing the cache-stack as stated in the previous answers, you can use on or the other):

cache=ispn
cache-stack=ec2

Run keycloak with those environment variables set (replace with your own values):

JAVA_OPTS_APPEND=-Djgroups.s3.bucket=MY_BUCKET_NAME -Djgroups.s3.access_key=MY_KEY_ID -Djgroups.s3.secret_access_key=MY_KEY_SECRET

I am away from the office today, so unable to test it… but does anyone know if you can skip providing the AWS credentials and use an IAM role in it’s place, I will be unable to create credentials due to security requirements but can use an IAM role attached to the EC2 (and the equivelent bucket policy applied for said role).

Looking at the code, it seems aws.S3_PING just uses default AWS library with a call to

DefaultAWSCredentialsProviderChain creds=DefaultAWSCredentialsProviderChain.getInstance();

So, the default methods used to configure the aws client will apply here.

Take a look at on how to configure the client to use a role amazon web services - How to use IAM role with AWS Java SDK - Stack Overflow

If the IAM role is attached to the ec2 instance itself, I suppose you can just set bucketname and credentials will be handled for you by the java SDK under the hood.

Keycloak: 20.0.3
To implement default “ec2” Infinispan stacks protocol you should:

  1. Put these files in ./providers dir
  • jgroups-aws-2.0.1.Final.jar (stacks protocol)
  • aws-java-sdk-core-1.12.410.jar (access to AWS creds, etc.)
  • aws-java-sdk-s3-1.12.410.jar (access to S3, etc.)
  • joda-time-2.12.2.jar (is used)
  1. JAVA_OPTS_APPEND=‘-Djgroups.s3.region_name=us-east-1 -Djgroups.s3.bucket_name=<backet_name>’
  • IAM profile role should be applied to EC2 instance (for AWS creds) and S3 bucket created
  1. Build Keycloak with option --cache-stack=ec2 (no --cache-config-file option!)
  • bin/kc.[sh|bat] build --cache-stack=ec2

By me it’s working) GL& HF!

Logs:

2023-02-21 09:47:28,062 INFO  [org.infinispan.server.core.transport.EPollAvailable] (keycloak-cache-init) ISPN005028: Native Epoll transport not available, using NIO instead: java.lang.ExceptionInInitializerError
2023-02-21 09:47:28,453 WARN  [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000569: Unable to persist Infinispan internal caches as no global state enabled
2023-02-21 09:47:28,472 WARN  [org.infinispan.PERSISTENCE] (keycloak-cache-init) ISPN000554: jboss-marshalling is deprecated and planned for removal
2023-02-21 09:47:28,506 INFO  [org.infinispan.CONTAINER] (keycloak-cache-init) ISPN000556: Starting user marshaller 'org.infinispan.jboss.marshalling.core.JBossUserMarshaller'
2023-02-21 09:47:28,987 INFO  [org.keycloak.broker.provider.AbstractIdentityProviderMapper] (main) Registering class org.keycloak.broker.provider.mappersync.ConfigSyncEventListener
2023-02-21 09:47:29,086 INFO  [org.infinispan.CONTAINER] (keycloak-cache-init) ISPN000128: Infinispan version: Infinispan 'Triskaidekaphobia' 13.0.10.Final
2023-02-21 09:47:29,270 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000078: Starting JGroups channel `ISPN`
2023-02-21 09:47:29,290 WARN  [org.jgroups.stack.Configurator] (keycloak-cache-init) NATIVE_S3_PING has been deprecated; please upgrade to a newer version of the protocol
2023-02-21 09:47:30,181 INFO  [org.jgroups.aws.s3.NATIVE_S3_PING] (keycloak-cache-init) using Amazon S3 ping in region us-east-1 with bucket 'my-jgroups-s3-bucket-test' and prefix ''
2023-02-21 09:47:30,937 INFO  [org.jgroups.aws.s3.NATIVE_S3_PING] (keycloak-cache-init) found bucket my-jgroups-s3-bucket-test
2023-02-21 09:48:04,101 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000094: Received new cluster view for channel ISPN: [ip-10-68-49-170-40943|3] (2) [ip-10-68-49-170-40943, ip-10-68-49-190-31671]
2023-02-21 09:48:04,111 INFO  [org.infinispan.CLUSTER] (keycloak-cache-init) ISPN000079: Channel `ISPN` local address is `ip-10-68-49-190-31671`, physical addresses are `[10.68.49.190:7800]`

2023-02-21 09:48:07,087 INFO  [io.quarkus] (main) Keycloak 20.0.3 on JVM (powered by Quarkus 2.13.6.Final) started in 78.156s. Listening on: http://0.0.0.0:8080 and https://0.0.0.0:8443
2023-02-21 09:48:07,087 INFO  [io.quarkus] (main) Profile prod activated.
1 Like

I ended up going with jdbc_ping but it also relies on jgroups-aws-2.0.1.Final.jar.

Or you could do something like this:

    <jgroups>
        <stack name="cluster">
            <TCP bind_port="7600"/>
            <TCPPING
                    initial_hosts="{{groups['tag_group_' + ec2_tag_environment + '_iam_infinispan'] | join('[7600], ')}}[7600]"
                     port_range="0"/>
            <MERGE3 min_interval="10000" max_interval="30000"/>
            <FD_SOCK client_bind_port="57600" start_port="57600"/>
            <!-- Suspect node `timeout` to `timeout + timeout_check_interval` millis after the last heartbeat -->
            <FD_ALL timeout="10000" interval="2000" timeout_check_interval="1000"/>
            <VERIFY_SUSPECT timeout="1000"/>

            <pbcast.NAKACK2 use_mcast_xmit="false" xmit_interval="100" xmit_table_num_rows="50" xmit_table_msgs_per_row="1024"
                            xmit_table_max_compaction_time="30000" resend_last_seqno="true"/>
            <UNICAST3 xmit_interval="100" xmit_table_num_rows="50" xmit_table_msgs_per_row="1024" xmit_table_max_compaction_time="30000"/>
            <pbcast.STABLE stability_delay="500" desired_avg_gossip="5000" max_bytes="1M"/>
            <pbcast.GMS print_local_addr="false" join_timeout="5000"/>
            <UFC_NB max_credits="3m" min_threshold="0.40"/>
            <MFC_NB max_credits="3m" min_threshold="0.40"/>
            <FRAG3/>
        </stack>
    </jgroups>

tag_group, ec2_tag_environment are provided via ansible during deployment.