Keycloak bare metal quarkus and cluster ha - distributed cache not working

Hi everybody,

I have been working for a few days on the installation of a keycloak cluster in azure on redhat machines for later installation on corporate bare metal redhat machines.

The installation on a single server in azure with ssl, database postgres and nginx proxy in productive mode is perfect, but when I want to have a cluster with 2 redhat machines and distributed cache, it doesn’t work.

When I check the keycloak logs I don’t see that looking for “members” the servers that would have to form my cluster are added and they all end up being shown as “coordinators”, which is incorrect.

I have created rules in firewalld and are open 80/tcp, 443/tcp, 7800/tcp/upd

server1 log

2022-10-26 18:54:28,918 INFO [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) redhattest1-35666: no members discovered after 2053 ms: creating cluster as coordinator

server2 log

022-10-26 18:42:50,695 INFO [org.jgroups.protocols.pbcast.GMS] (keycloak-cache-init) redhattest2-34025: no members discovered after 2003 ms: creating cluster as coordinator

Here I put my main configuration…

nginx.conf

server {
    listen 80;
    server_name x.y.z;
    return 301 https://x.y.z$request_uri;
}

server {
    listen 443 ssl;
    server_name x.y.z;

    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_certificate     /etc/ssl/x.y.z.fullchain.pem;
    ssl_certificate_key /etc/ssl/x.y.z.private.key;

                                                                    # https://itnext.io/nginx-as-reverse-proxy-in-front-of-keycloak-21e4b3f8ec53

    proxy_set_header X-Forwarded-For $proxy_protocol_addr;          # To forward the original client's IP address 
    proxy_set_header X-Forwarded-Proto $scheme;                     # to forward the  original protocol (HTTP or HTTPS)
    proxy_set_header Host $host;                                    # to forward the original host requested by the client


    location / {
        proxy_pass  https://10.3.0.4:8443;
    }

}

#####################################################################################

}

keycloak.conf

db=postgres
db-url-host=XXXX
db-username=YYYY
db-password=ZZZZ
https-port=8443
https-protocols=TLSv1.3,TLSv1.2
hostname=x.y.z
https-certificate-file=/etc/ssl/certs/x.y.z.fullchain.pem
https-certificate-key-file=/etc/ssl/certs/x.y.z.private.key
proxy=edge
cache=ispn
cache-stack=tcp
cache-config-file=cache-ispn-ha.xml
log=file
log-file=/home/redhat/keycloak-prima/keycloak.out
log-level=INFO,org.infinispan:DEBUG, org.jgroups:DEBUG

cache-ispn-ha.xml

Can you help me please?

Thanks so much,

Xavier.

  • You probably checked that, but the cache-config-file location is relative to /conf which is inside Keycloak installation. Which means that your keycloak.conf is looking for the a file <keycloak_installation>/conf/cache-ispn-ha.xml.
  • With a default infinispan config, I suppose you shouldn’t set cache-stack, as it will take precedence over your custom configuration. That means you are using the default tcp stack defined in the keycloak distribution.

That being said, I see your two instances are in the same subnet. By default, which corresponds to cache=ispn, which uses the jgroups transport UDP (take a look at Chapter 7. List of Protocols if curious about how it works), instances in the same subnet should be able to find each other with IP multicast. Check the SO firewall rules to check if that is disabled.

Also (by Configuring distributed caches - Keycloak) , the tcp stack uses udp for the discovery phase. This document is pretty complete, but you can also take a look at this issue, as it shows what a working custom jgroups configuration should look like (ignore the fact that this is a bug report, because the problem there was the cache-stack overwriting the settings).

Hope this helps.

Most probably why your custom cache-ispn-ha.xml configuration isn’t used, is because you have configured this:

When using cache-config-file DON’T use cache-stack at the same time. If cache-stack is given, cache-config-file is being ignored.

Hi @weltonrodrigo and @dasniko , thanks so much for your help.

Yes, I have the cache-ispn-ha.xml in the conf directory. The application read oks this configuration.
I have remove the cache-stack=tcp of my code too.

Final code snippet in keycloak.conf is:

cache=ispn
cache-config-file=cache-ispn-ha.xml

The problem is in the discovery of the nodes.
I have this snippet in cache config file because I would like simply ping connection for discovery nodes.

<jgroups>
    <stack name="tcpping" extends="tcp">
        <TCP     bind_port="7800" />
        <TCPPING initial_hosts="redhattest1[7800],redhattest2[7800]" port_range="0" max_dynamic_hosts="2"/>
    </stack>
</jgroups>

<cache-container name="keycloak">
    <transport cluster="mykeycloak" lock-timeout="60000"  stack="tcpping" node-name="redhattestttt1"/>
    <local-cache name="realms">

Tested too with internal ip address.
It should work but not, when I launch 2 keycloak instances appear 2 nodes as a coordinator (not ok).

I think perhaps the problem is use Azure virtual machines.
I have firewalld enabled for this ports for 2 servers and enabled rule AllowVnetInBound enabled in networks for each virtual machine in Azure Portal.

[redhat@redhattest1 ~]$ sudo firewall-cmd --zone=public --permanent --list-ports
80/tcp 443/tcp 7800/tcp 7800/udp

I want to define this custom tcp solution (or similar) for discovery (no azure solution with AZURE_PING) because this is a POC in Azure but I want to install Keycloak quarkus in a Bare metal corporate infraestructure.

It would have to be possible but maybe I’ll try the solution with AZURE_PING and I already left the horns on the baremetal machine when I play and I assume that the ping problem was because it was azure :wink:

I think the documentation of new Keycloak Quarkus (Guides - Keycloak) is simply and clair but insufficient on this documentation for use cases as common as tcpping on “normal” unix servers (not azure, ec, google).

Any idea? :wink:

Thanks for your help.

Xavier.

I’m surely not that familiar with infinispan xml configuration, so not sure what can be going wrong here. Maybe the extends semantic behaving differently from what you expect here?

I’d suggest you turn tracing or debug logs on jgroups package via quarkus Set Quarkus Logging Category Level via Environment Variables - Stack Overflow and see if anything pops up. I suppose org.jgroups should be enough, if not, also org.infinispan.

Hi, @weltonrodrigo,

Thanks so much, I have activated the debug logs but the issue is clair, is a problema of jgroups connection.

I will create a new post in this group about azure connection.

Thanks so much!!

Xavier.