Hi everyone,
I want to run Keycloak v17 (Quarkus) on ECS but I have some troubles configuring the distributed cache. Since the UDP and TCP multicast is not supported on EC2 on AWS I have to use one of the other providers from Infinispan. Ec2 seems to be the solution to my problem but I can’t get it working do to an errore related with the authentication.
Since the NATIVE_S3_PING uses the AWS SDK to connect to the S3 bucket it must recover the credentials from one of the supported methods by the SDK. I’m running my ECS task using a Service Role, so the temporary credentials should be present inside the task manifest (I don’t know how to verify if really the credentials are there).
The other option I’m working on to get the cluster mode working, is to build the container with the cache stack based on kubernetes which uses the DNS_PING, but the thing is getting more complicated because I have to configure the ECR Service Discovery and the Route 53 Service Registry.
Some information I didn’t mention above, I’m running the tasks inside different private subnets. This decision was made to guarantee the service availability and security. Also the tasks are behind an Application Balancer and KC starts in proxy mode edge.
We’ve been able to set it up using both S3_PING and DNS_PING on ECS Fargate.
With S3_PING, you first have to add the jar dependency to your keycloak dockerfile and build with kc build.
When starting keycloak you need to set jgroups.s3.region_name and jgroups.s3.bucket_name, with for example the JAVA_OPTS_APPEND environment variable.
The tasks role needs Read and Write Permissions on the bucket.
If you’re using CDK to setup your infrastructure, jgroupsbucket.GrantReadWrite(taskDefinition.TaskRole()) is sufficient.
Hopefully that helps.
We unfortunately have issues where keycloak is extremely slow for a few minutes if you stop a task or after startup. Please let us know if you also experience this issue when your setup is working accordingly.
But it sounds like the role that’s used doesn’t have the correct permission. A Fargate task has 2 roles, an executionRole and a Role. I believe we append the policy to the default Role.
Regarding slowness, it’s not really slow to start, it starts quickly. It’s just very slow to respond when interacting with it, for a couple of minutes.
Thank you for the suggestions. As you said there are 2 kind of role: the execution and the task . The task role is responsible for giving the AWS credentials to access the AWS APIs (for reference see IAM Roles for Tasks - Amazon Elastic Container Service)
About the slowness, i’ll do some tests and I’ll respond you back.
@squalou Would you mind elaborating on what your ending solution was? Did you need to add a maven build step as in the above Dockerfile or did you just need to download and provide the aws sdk jar similar to how the native_s3_ping jar is provided (and if so where is that aws sdk jar)? I’m trying to get distributed caches working on ec2 (on bare metal) and am not very familiar with the java development ecosystem
no java sources or whatever, just a pom.xml (see below for the complete file anonymized )
and mvn clean package to produce a jar : s3-native-ping-bundle-jar-with-dependencies.jar
The required dependencies will be pulled by maven,
you’ll need to ADD the jar in /opt/keycloak/providers/ in the Dockerfile.
(I run mvn command first then docker build, you could probably do all in an elaborate Dockerfile with ‘build’ steps too)
@saguntumkar
I have a question myself for you now.
I do have keycloak writing to S3 and all, and it’s nice BUT clustering itself cannot work : indeed the s3 files contain “docker ip address” of keycloak service. (172.x.x.x) which is pretty useless on ECS.
Did you manage o get things working ?
Maybe with a non-default “network mode” in task definition ?
@squalou yes I did. In my case the problem was the port 7800 was not open in my EC2 instance’s security group for TCP. You need to open it in both inbound and outbound rules so your nodes (EC2 instances) can talk with each other.
You have several EC2 instances in your ECS cluster ?
[Sagun] => Correct we have 2 EC2 instances running right now
does each keycloak run on a separate EC2 instance, or all on the same ?
[Sagun] => Correct, each KC container runs on a separate EC2 instance
could you kindly have a look at the content o an S3 file, and tell me whar the IP address looks like ? (172. or something else)
[Sagun] => these are private IPs. Content of the S3 file should look something like below
<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN> <SOME UUID> <FIRST EC2 IP>:7800 T
<SECOND EC2 IP ADDRESS SEPARATED BY HYPHEN> <SOME UUID> <SECOND EC2 IP>:7800 F
If the cluster formation is successful, you should also see below message in the logs
Received new cluster view for channel ISPN: [<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN>] (2) [<FIRST EC2 IP ADDRESS SEPARATED BY HYPHEN>, <SECOND EC2 IP ADDRESS SEPARATED BY HYPHEN>]
Thank you for the details @saguntumkar
There is still something wrong or different in my setup.
The s3 file contains “docker network” IP address, and not private IP of the EC2 instance.
Maybe it’s due to the network mode ?
I use “Bridge” which is the default, but maybe I should use “Host” ?
(and even then, I don’t really understand how the private IP of the EC2 instance can be guessed by the container.)
Finally, in your task definition, I assume you map host ports to container ports ? (7800 and 8080 ?)
We have custom network mode. I am not much aware how it is setup as our cloud ops team did it. And no we don’t have port mapping in task for 7800. I believe the default behavior (if port mapping is not set) is to map same port?