I need help while deploying keycloak:latest on AWS fargate

The main problem is on AWS fargate with 1 cluster is working fine but with multiple clusters, the keycloak is not working.

I am using Postgres RDS database for connection and custom cache config file for cache distribution.

any one please help…

@xgp Please see my case.

If you post 1) what you have tried, and 2) what specific problems you are having (including full logs and configurations), then it might be possible to help you. “the keycloak is not working” is neither informative or helpful.

I have following dockerfile:

FROM quay.io/keycloak/keycloak:latest

ENV KC_CACHE_CONFIG_FILE=cache-ispn-jdbc-ping.xml
ENV KC_FEATURES=token-exchange,preview
ENV KC_DB=postgres
# COPY conf/keycloak.conf /opt/keycloak/conf/keycloak.conf
COPY ./conf/cache-ispn-jdbc-ping.xml /opt/keycloak/conf/cache-ispn-jdbc-ping.xml
# # COPY ./themes/ /opt/keycloak/themes/ # only applies if you have customized themes
RUN rm -f /opt/keycloak/conf/cache-ispn.xml
RUN /opt/keycloak/bin/kc.sh build --cache-config-file=cache-ispn-jdbc-ping.xml
WORKDIR /opt/keycloak
ENTRYPOINT [ "/opt/keycloak/bin/kc.sh" ]

and custom cache config file:

<?xml version="1.0" encoding="UTF-8"?>
    xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd"

        <stack name="jdbc-ping-tcp" extends="tcp">

            <!-- https://keycloak.discourse.group/t/use-of-jdbc-ping-with-keycloak-17-quarkus-distro/13571/4 -->
            <JDBC_PING connection_driver="org.postgresql.Driver"
                       connection_username="${env.KC_DB_USERNAME}" connection_password="${env.KC_DB_PASSWORD}"
                       initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, bind_addr VARCHAR(200) NOT NULL, created timestamp NOT NULL, cluster_name varchar(200) NOT NULL, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
                       insert_single_sql="INSERT INTO JGROUPSPING (own_addr, bind_addr, created, cluster_name, ping_data) values (?,'${jboss.bind.address:}',NOW(), ?, ?);"
                       delete_single_sql="DELETE FROM JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                       select_all_pingdata_sql="SELECT ping_data FROM JGROUPSPING WHERE cluster_name=?;"

  <cache-container name="keycloak">
    <transport lock-timeout="60000" stack="jdbc-ping-tcp"/>
    <local-cache name="realms">
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      <memory max-count="10000"/>
    <local-cache name="users">
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      <memory max-count="10000"/>
    <distributed-cache name="sessions" owners="2">
      <expiration lifespan="-1"/>
    <distributed-cache name="authenticationSessions" owners="2">
      <expiration lifespan="-1"/>
    <distributed-cache name="offlineSessions" owners="2">
      <expiration lifespan="-1"/>
    <distributed-cache name="clientSessions" owners="2">
      <expiration lifespan="-1"/>
    <distributed-cache name="offlineClientSessions" owners="2">
      <expiration lifespan="-1"/>
    <distributed-cache name="loginFailures" owners="2">
      <expiration lifespan="-1"/>
    <local-cache name="authorization">
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      <memory max-count="10000"/>
    <replicated-cache name="work">
      <expiration lifespan="-1"/>
    <local-cache name="keys">
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      <expiration max-idle="3600000"/>
      <memory max-count="1000"/>
    <distributed-cache name="actionTokens" owners="2">
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      <expiration max-idle="-1" lifespan="-1" interval="300000"/>
      <memory max-count="-1"/>

And get following warnings on one of fargate cluser’s error logs:

WARN [org.infinispan.PERSISTENCE] (keycloak-cache-init) ISPN000554: jboss-marshalling is deprecated and planned for removal

WARN [org.infinispan.CONFIG] (keycloak-cache-init) ISPN000569: Unable to persist Infinispan internal caches as no global state enabled

WARN [io.quarkus.agroal.runtime.DataSources] (main) Datasource <default> enables XA but transaction recovery is not enabled. Please enable transaction recovery by setting quarkus.transaction-manager.enable-recovery=true, otherwise data may be lost if the application is terminated abruptly

WARN [com.arjuna.ats.arjuna] (main) ARJUNA012210: Unable to use InetAddress.getLocalHost() to resolve address.

Also I used following env variables on fargate task definition:

 { "name": "KC_LOG_LEVEL", "value": "INFO"},
        { "name" : "KC_DB_URL", "value" : "jdbc:postgresql://${var.db_endpoint}:5432/keycloak" },
        { "name" : "KC_DB", "value" : "postgres" },
        { "name" : "KC_PROXY", "value" : "edge" },
        { "name" : "KC_HOSTNAME_STRICT", "value" : "false" },
        { "name" : "KC_HOSTNAME_STRICT_BACKCHANNEL", "value" : "true" },
        { "name" : "KC_DB_SCHEMA", "value" : "public" },
        { "name" : "KC_CACHE_CONFIG_FILE", "value" : "/opt/keycloak/conf/cache-ispn-jdbc-ping.xml" },
        { "name" : "KC_HOSTNAME", "value" : "keycloak.<mydomain>.com" },
        { "name" : "KC_DB_USERNAME", "value" : "${jsondecode(data.aws_secretsmanager_secret_version.current_secrets.secret_string)["username"]}" },
        { "name" : "KC_DB_PASSWORD", "value" : "${jsondecode(data.aws_secretsmanager_secret_version.current_secrets.secret_string)["password"]}" },
        { "name" : "KEYCLOAK_ADMIN", "value" : "admin" },
        { "name" : "KEYCLOAK_ADMIN_PASSWORD", "value" : "admin" }

Gets following:

Need help deploying keycloak v21.0 on multiple clusters.
thanks in advance!

Everything looks fine there. Those warnings are normal, and are not consequential. Are you logged in when you get the 401?

yes, I am logged in but sometimes, In network token,

{error: "invalid_grant", error_description: "Code not valid"}
"Code not valid"

Is this the error of application or infrastructure?

That happens in the code-to-token flow when the code isn’t valid. By “Sometimes” do you mean it only fails/succeeds intermittently? Are you trying to run 2 separate keycloak instances behind the same hostname and load balancer? This is the kind of thing I see when someone is balancing traffic between two keycloaks that are not actually connected via infinispan.

Yes, I run 2 keycloak fargate task with same docker image with application loadbalancer.

Problem: Keycloak is work fine in one fargate task. But not working in multiple fargate task.

I don’t understand. Are you trying to connect them in the same cluster? Is there any indication from the logs that they are discovering each other via infinispan? If not, you have a problem with your network setup on AWS, and this is not a keycloak question.

Yes, I am trying to connect them in same cluster.
Log related to infinispan:

	2023-03-31 16:02:34,935 DEBUG [org.keycloak.models.sessions.infinispan.changes.sessions.PersisterLastSessionRefreshStore] (Timer-0) Updating 0 userSessions with lastSessionRefresh: 1680278494

I don’t know why these multiple tasks in the same cluster can not communicate with each other. :frowning:

That’s an AWS networking setup question, not a keycloak issue. Figure that out first.

The network configuration was flawless. And keycloak worked as expected when I ran the ECS service with only one task. It did not work while keycloak was running on multiple ECS tasks.

Great. How did you validate that the network configuration was flawless? What did you do in order to verify that both tasks could communicate over the Infinispan ports?

Good day guys.
Any solution? I’m facing the exact same issue. @saugat have you resolve this and care to share your knowledge?

What’s the config I need to be looking out for?

My guess was it was an infinispan problem because it can’t communicate over the network. If you can validate those ports are open between containers, that’s a good first step.

Thanks for the pointer. After looking deeper into all the security group, we have found out the security group which the fargate service is using is missing the inbound rules to allow all traffic within the security group. After that it works like a charm.

Do you know if I can tighten down to inbound rule? e.g what ports to specify and traffice type? rather than allow all traffic , all ports at the moment? Thanks ! @xgp !

Sorry @isun , but I am not an AWS expert. I’m sure what you’re asking is possible, but probably better answered in an AWS forum.