KC Quarkus cluster mode on AWS ECS Fargate

Hi everyone,
I want to run Keycloak v17 (Quarkus) on ECS but I have some troubles configuring the distributed cache. Since the UDP and TCP multicast is not supported on EC2 on AWS I have to use one of the other providers from Infinispan. Ec2 seems to be the solution to my problem but I can’t get it working do to an errore related with the authentication.

2022-03-04 15:40:25,211 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: Error starting component org.infinispan.remoting.transport.Transport
2022-03-04 15:40:25,212 ERROR [org.keycloak.quarkus.runtime.cli.ExecutionExceptionHandler] (main) ERROR: com/amazonaws/auth/AWSCredentialsProvider

Since the NATIVE_S3_PING uses the AWS SDK to connect to the S3 bucket it must recover the credentials from one of the supported methods by the SDK. I’m running my ECS task using a Service Role, so the temporary credentials should be present inside the task manifest (I don’t know how to verify if really the credentials are there).

The other option I’m working on to get the cluster mode working, is to build the container with the cache stack based on kubernetes which uses the DNS_PING, but the thing is getting more complicated because I have to configure the ECR Service Discovery and the Route 53 Service Registry.

Some information I didn’t mention above, I’m running the tasks inside different private subnets. This decision was made to guarantee the service availability and security. Also the tasks are behind an Application Balancer and KC starts in proxy mode edge.

1 Like

Not sure if it will help for your use case, but I went through getting JDBC_PING working with v17, and documented the process here: Use of JDBC_PING with Keycloak 17 (Quarkus distro)

I did this initially for ECS/Fargate, and have now used it in a a few other situations where multicast was not available or allowed.

We’ve been able to set it up using both S3_PING and DNS_PING on ECS Fargate.

With S3_PING, you first have to add the jar dependency to your keycloak dockerfile and build with kc build.

When starting keycloak you need to set jgroups.s3.region_name and jgroups.s3.bucket_name, with for example the JAVA_OPTS_APPEND environment variable.

The tasks role needs Read and Write Permissions on the bucket.
If you’re using CDK to setup your infrastructure, jgroupsbucket.GrantReadWrite(taskDefinition.TaskRole()) is sufficient.

Hopefully that helps.

We unfortunately have issues where keycloak is extremely slow for a few minutes if you stop a task or after startup. Please let us know if you also experience this issue when your setup is working accordingly.

Thanks for you’re response, i’ll gonna use this option if I canno’t make a working solution based on S3 or DNS_PING.

I did the same steps you describe above but I’m getting the authentication error… I added the plugin configuration as following inside my Dockerfile:

ENV JAVA_OPTS_APPEND="-Djgroups.s3.region_name=eu-west-1,jgroups.s3.bucket_name=my-bucket"

I already created a policy that allows read and write operations inside the bucket and attach it to the service role.

I’ll update you about the KC startup and stop time when I’ll make working the cluster mode

Hi,
facing the exact same issue,

using https://repo1.maven.org/maven2/org/jgroups/aws/s3/native-s3-ping/1.0.0.Final/native-s3-ping-1.0.0.Final.jar

putting it in /opt/keycloak/providers/

/opt/keycloak/bin/kc.sh build --cache-stack=ec2 --db mysql

JAVA_OPTS_APPEND="-Djgroups.s3.region_name=eu-west-1 -Djgroups.s3.bucket_name=my-bucket"

and … same error in the end.

Still trying, I’ll share anythingI find on my side.

1 Like

This is the Dockerfile we’re using;

FROM public.ecr.aws/docker/library/maven:3.8.4-openjdk-17-slim as maven-builder

COPY native-s3-ping/pom.xml ./

RUN mvn package

FROM quay.io/keycloak/keycloak:17.0.0 as keycloak-builder

COPY --from=maven-builder --chown=keycloak target/s3-native-ping-bundle-*-jar-with-dependencies.jar /opt/keycloak/providers/

ENV KC_METRICS_ENABLED=true \
    KC_DB=postgres \
    KC_CACHE=ispn \
    KC_CACHE_STACK=ec2
    
RUN /opt/keycloak/bin/kc.sh build

FROM quay.io/keycloak/keycloak:17.0.0
COPY --from=keycloak-builder /opt/keycloak/lib/quarkus/ /opt/keycloak/lib/quarkus/
COPY --from=keycloak-builder /opt/keycloak/providers/* /opt/keycloak/providers/


ENTRYPOINT ["/opt/keycloak/bin/kc.sh", "start"]

This is the pom.xml we’re copying into the first build container (to build a fat jar with all dependencies).

<!--
    This pom is used to build a fat jar with the s3 native pinge used in infinispan to connect the nodes in the cluster.
-->
<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.group</groupId>
    <artifactId>s3-native-ping-bundle</artifactId>
    <version>1</version>

    <properties>
        <aws.sdk.version>1.12.167</aws.sdk.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-core</artifactId>
                <version>${aws.sdk.version}</version>
            </dependency>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-s3</artifactId>
                <version>${aws.sdk.version}</version>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.jgroups.aws.s3</groupId>
            <artifactId>native-s3-ping</artifactId>
            <version>1.0.0.Final</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.1.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <version>3.1.1</version>
                <executions>
                    <execution>
                        <id>copy-deps</id>
                        <phase>process-sources</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

And then running th container with the following environment variables;

"JAVA_OPTS": "-Xms64m -Xmx2048m -XX:MetaspaceSize=96M -XX:MaxMetaspaceSize=256m -Djava.net.preferIPv4Stack=true -Djgroups.s3.region_name=eu-west-1  -Djgroups.s3.bucket_name=bucket-name"

But it sounds like the role that’s used doesn’t have the correct permission. A Fargate task has 2 roles, an executionRole and a Role. I believe we append the policy to the default Role.

  • Regarding slowness, it’s not really slow to start, it starts quickly. It’s just very slow to respond when interacting with it, for a couple of minutes.

These are the allowed actions needed for the Task Role.

"s3:GetObject*",
"s3:GetBucket*",
"s3:List*",
"s3:DeleteObject*",
"s3:PutObject",
"s3:PutObjectLegalHold",
"s3:PutObjectRetention",
"s3:PutObjectTagging",
"s3:PutObjectVersionTagging",
"s3:Abort*"

Some of them are on the bucket and some are on bucket objects. If you’d like I can give you the CDK code.

Thank you for the suggestions. As you said there are 2 kind of role: the execution and the task :sweat_smile:. The task role is responsible for giving the AWS credentials to access the AWS APIs (for reference see IAM Roles for Tasks - Amazon Elastic Container Service)

About the slowness, i’ll do some tests and I’ll respond you back.

1 Like

In my case, the issue was indded the aws sdk jar dependency missing.
Will now pay attention to slowness etc.

Thank you so much !

2 Likes

@squalou Would you mind elaborating on what your ending solution was? Did you need to add a maven build step as in the above Dockerfile or did you just need to download and provide the aws sdk jar similar to how the native_s3_ping jar is provided (and if so where is that aws sdk jar)? I’m trying to get distributed caches working on ec2 (on bare metal) and am not very familiar with the java development ecosystem

Hi,
I’m no java expert either but in my ecosystem, which is maven-based, it ended up with a java project built with this dependency int the pom.xml.

        <dependency>
            <groupId>org.jgroups.aws.s3</groupId>
            <artifactId>native-s3-ping</artifactId>
            <version>1.0.0.Final</version>
        </dependency>

no java sources or whatever, just a pom.xml (see below for the complete file anonymized )
and mvn clean package to produce a jar : s3-native-ping-bundle-jar-with-dependencies.jar

The required dependencies will be pulled by maven,
you’ll need to ADD the jar in /opt/keycloak/providers/ in the Dockerfile.

(I run mvn command first then docker build, you could probably do all in an elaborate Dockerfile with ‘build’ steps too)

and, it should work.

<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mywonderfulcompany</groupId>
    <artifactId>s3-native-ping-bundle</artifactId>
    <version>HEAD</version>
    <properties>
        <maven-assembly-plugin.version>3.1.1</maven-assembly-plugin.version>
        <maven-dependency-plugin.version>3.1.1</maven-dependency-plugin.version>
        <native-s3-ping.version>1.0.0.Final</native-s3-ping.version>
        <aws.sdk.version>1.12.167</aws.sdk.version>
    </properties>
    <build>
        <finalName>${project.artifactId}</finalName>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>${maven-assembly-plugin.version}</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <version>${maven-dependency-plugin.version}</version>
                <executions>
                    <execution>
                        <id>copy-deps</id>
                        <phase>process-sources</phase>
                        <goals>
                            <goal>copy-dependencies</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-core</artifactId>
                <version>${aws.sdk.version}</version>
            </dependency>
            <dependency>
                <groupId>com.amazonaws</groupId>
                <artifactId>aws-java-sdk-s3</artifactId>
                <version>${aws.sdk.version}</version>
            </dependency>
        </dependencies>
    </dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.jgroups.aws.s3</groupId>
            <artifactId>native-s3-ping</artifactId>
            <version>${native-s3-ping.version}</version>
        </dependency>
    </dependencies>
</project>

1 Like

We got this working with the setup @squalou and @vdahlberg mentioned. Thanks again !!
We added below policy in our ECS task IAM role.

{
“Sid”: “VisualEditor4”,
“Effect”: “Allow”,
“Action”: [
“s3:GetObject*”,
“s3:GetBucket*”,
“s3:List*”,
“s3:DeleteObject*”,
“s3:PutObject”,
“s3:PutObjectLegalHold”,
“s3:PutObjectRetention”,
“s3:PutObjectTagging”,
“s3:PutObjectVersionTagging”,
“s3:Abort*”
],
“Resource”: [
“arn:aws:s3:::<YOUR_BUCKET_NAME>”,
“arn:aws:s3:::<YOUR_BUCKET_NAME>/*”
]
}

@saguntumkar
I have a question myself for you now.
I do have keycloak writing to S3 and all, and it’s nice BUT clustering itself cannot work : indeed the s3 files contain “docker ip address” of keycloak service. (172.x.x.x) which is pretty useless on ECS.

Did you manage o get things working ?
Maybe with a non-default “network mode” in task definition ?

@squalou yes I did. In my case the problem was the port 7800 was not open in my EC2 instance’s security group for TCP. You need to open it in both inbound and outbound rules so your nodes (EC2 instances) can talk with each other.

@saguntumkar
mm, interesting.
Could you confirm me some details ?

  • You have several EC2 instances in your ECS cluster ?
  • does each keycloak run on a separate EC2 instance, or all on the same ?
  • could you kindly have a look at the content o an S3 file, and tell me whar the IP address looks like ? (172. or something else)

thanks!

  • You have several EC2 instances in your ECS cluster ?
    [Sagun] => Correct we have 2 EC2 instances running right now
  • does each keycloak run on a separate EC2 instance, or all on the same ?
    [Sagun] => Correct, each KC container runs on a separate EC2 instance
  • could you kindly have a look at the content o an S3 file, and tell me whar the IP address looks like ? (172. or something else)
    [Sagun] => these are private IPs. Content of the S3 file should look something like below

<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN> <SOME UUID> <FIRST EC2 IP>:7800 T
<SECOND EC2 IP ADDRESS SEPARATED BY HYPHEN> <SOME UUID> <SECOND EC2 IP>:7800 F

If the cluster formation is successful, you should also see below message in the logs

Received new cluster view for channel ISPN: [<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN>] (2) [<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN>, <SECOND EC2 IP ADDRESS SEPARATED BY HYPHEN>]

Thank you for the details @saguntumkar
There is still something wrong or different in my setup.
The s3 file contains “docker network” IP address, and not private IP of the EC2 instance.

Maybe it’s due to the network mode ?

I use “Bridge” which is the default, but maybe I should use “Host” ?
(and even then, I don’t really understand how the private IP of the EC2 instance can be guessed by the container.)

Finally, in your task definition, I assume you map host ports to container ports ? (7800 and 8080 ?)

We have custom network mode. I am not much aware how it is setup as our cloud ops team did it. And no we don’t have port mapping in task for 7800. I believe the default behavior (if port mapping is not set) is to map same port?

Okay !
I understand best now, Thank you again for the clarification.

Yes with this network mode, mapping should just work, and that explains how yuou get the right IP.

Unfortunately for me I cant use this mode,
but good news for anyone stepping by with such issues : is is possible to use the good old JDBC_PING

see also here:

(requires a tweak in entrypoint to get prooper ip adress for ECSMETADATA before running kc.sg)