ERROR: Multicast interface not available

Dear Community,

I am trying to run Keycloak on Kubernetes with multiple instances. For that I have read some documentation and found that, among other things, the standalone-ha.xml operating mode can be used.

When I deploy Keycloak, I receive the following error:

ERROR [org.wildfly.extension.mod_cluster] (MSC service thread 1-1) WFLYMODCLS0004: Mod_cluster requires Advertise but Multicast interface is not available

I’ve tried googling this error but still have no idea what it means. I assume it is related to the different Keycloak instances not being able to find/ping each other due to some network related issue. Does anyone know what this error means? Thanks.

Infinispan (the distributed cache that makes Keycloak clustering possible) requires multicast by default. If you are using the normal standalone-ha.xml configuration, you will need to enable multicast networking between your kubernetes pods that are running Keycloak.

If you are not able to enable multicast networking between your kubernetes pods for some reason, it is also possible to use TCP, but that requires configuration changes. That, and more information about setting up a cluster, can be found in the documentation here: https://www.keycloak.org/docs/latest/server_installation/#_clustering

The availability of multicast depends on the network your k8s cluster is deployed and the CNI used. Usually, cloud providers don’t allow multicast, so you need to tweak your jgroups stack and switch the ping protocol part from the MPING (PING using multicast) to another ping protocol (kube_ping, included in the standard jboss dist seems like a good choice)

It’s pretty easy to setup, but requires creating a custom docker image and running a .cli script, here’s what we’re using

echo Creating the kube-ping jgroups stack (with KUBE_PING)
/subsystem=jgroups/stack=kube-ping:add()
/subsystem=jgroups/stack=kube-ping/transport=TCP:add(socket-binding="jgroups-tcp")
/subsystem=jgroups/stack=kube-ping/transport=TCP/property=use_ip_addrs:add(value=true)
/subsystem=jgroups/stack=kube-ping/transport=TCP/property=recv_buff_size:add(value=20000000)
/subsystem=jgroups/stack=kube-ping/transport=TCP/property=send_buf_size:add(value=640000)
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING:add()
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=port_range:add(value=${env.kubernetes.KUBE_PING_PORT_RANGE:0})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=connectTimeout:add(value=${env.KUBERNETES_CONNECT_TIMEOUT:5000})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=readTimeout:add(value=${env.KUBERNETES_READ_TIMEOUT:30000})
# WARNING : never set operationAttempts to 0 => the server would crash with a NPE
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=operationAttempts:add(value=${env.KUBERNETES_OPERATION_ATTEMPTS:3})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=operationSleep:add(value=${env.KUBERNETES_OPERATION_SLEEP:1000})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=masterProtocol:add(value=${env.KUBERNETES_MASTER_PROTOCOL:https})
# the following variable should be available in the runtime environment, so there is no default.
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=masterHost:add(value=${env.KUBERNETES_SERVICE_HOST})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=masterPort:add(value=${env.KUBERNETES_SERVICE_PORT})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=apiVersion:add(value=${env.KUBERNETES_API_VERSION:v1})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=namespace:add(value=${env.KUBERNETES_NAMESPACE:default})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=labels:add(value=${env.KUBERNETES_LABELS})
#/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=clientCertFile:add(value=${env.KUBERNETES_CLIENT_CERTIFICATE_FILE})
#/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=clientKeyFile:add(value=${env.KUBERNETES_CLIENT_KEY_FILE})
#/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=clientKeyPassword:add(value=${env.KUBERNETES_CLIENT_KEY_PASSWORD})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=clientKeyAlgo:add(value=${env.KUBERNETES_CLIENT_KEY_ALGO:RSA})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=caCertFile:add(value=${env.KUBERNETES_CA_CERTIFICATE_FILE:/var/run/secrets/kubernetes.io/serviceaccount/ca.crt})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=saTokenFile:add(value=${env.SA_TOKEN_FILE:/var/run/secrets/kubernetes.io/serviceaccount/token})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=dump_requests:add(value=${env.KUBERNETES_DUMP_REQUESTS:false})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=split_clusters_during_rolling_upgrades:add(value=${env.KUBERNETES_SPLIT_CLUSTERS_DURING_ROLLING_UPDATE:false})
/subsystem=jgroups/stack=kube-ping/protocol=kubernetes.KUBE_PING/property=useNotReadyAddresses:add(value=${env.KUBERNETES_USE_NOT_READY_ADDRESSES:true})
/subsystem=jgroups/stack=kube-ping/protocol=MERGE3:add()
/subsystem=jgroups/stack=kube-ping/protocol=FD_SOCK:add(socket-binding="jgroups-tcp-fd")
/subsystem=jgroups/stack=kube-ping/protocol=FD_ALL:add()
/subsystem=jgroups/stack=kube-ping/protocol=FD_ALL/property=timeout:add(value=3000)
/subsystem=jgroups/stack=kube-ping/protocol=FD_ALL/property=interval:add(value=1000)
/subsystem=jgroups/stack=kube-ping/protocol=FD_ALL/property=timeout_check_interval:add(value=1000)
/subsystem=jgroups/stack=kube-ping/protocol=VERIFY_SUSPECT:add()
/subsystem=jgroups/stack=kube-ping/protocol=VERIFY_SUSPECT/property=timeout:add(value=1000)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2:add()
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2/property=use_mcast_xmit:add(value=false)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2/property=xmit_interval:add(value=100)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2/property=xmit_table_num_rows:add(value=50)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2/property=xmit_table_msgs_per_row:add(value=1024)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.NAKACK2/property=xmit_table_max_compaction_time:add(value=30000)
/subsystem=jgroups/stack=kube-ping/protocol=UNICAST3:add()
/subsystem=jgroups/stack=kube-ping/protocol=UNICAST3/property=xmit_interval:add(value=100)
/subsystem=jgroups/stack=kube-ping/protocol=UNICAST3/property=xmit_table_num_rows:add(value=50)
/subsystem=jgroups/stack=kube-ping/protocol=UNICAST3/property=xmit_table_msgs_per_row:add(value=1024)
/subsystem=jgroups/stack=kube-ping/protocol=UNICAST3/property=xmit_table_max_compaction_time:add(value=30000)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.STABLE:add()
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.STABLE/property=stability_delay:add(value=200)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.STABLE/property=desired_avg_gossip:add(value=2000)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.STABLE/property=max_bytes:add(value=1M)
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.GMS:add()
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.GMS/property=max_join_attempts:add(value=${env.JGROUPS_MAX_JOIN_ATTEMPTS:0})
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.GMS/property=join_timeout:add(value=${env.JGROUPS_JOIN_TIMEOUT:2000})
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.GMS/property=merge_timeout:add(value=${env.JGROUPS_MERGE_TIMEOUT:1000})
/subsystem=jgroups/stack=kube-ping/protocol=pbcast.GMS/property=view_ack_collection_timeout:add(value=${env.JGROUPS_VIEW_ACK_COLLECTION_TIMEOUT:1000})
/subsystem=jgroups/stack=kube-ping/protocol=MFC:add()
/subsystem=jgroups/stack=kube-ping/protocol=MFC/property=max_credits:add(value=4m)
/subsystem=jgroups/stack=kube-ping/protocol=MFC/property=min_threshold:add(value=0.40)
/subsystem=jgroups/stack=kube-ping/protocol=UFC:add()
/subsystem=jgroups/stack=kube-ping/protocol=UFC/property=max_credits:add(value=4m)
/subsystem=jgroups/stack=kube-ping/protocol=UFC/property=min_threshold:add(value=0.40)
/subsystem=jgroups/stack=kube-ping/protocol=FRAG3:add()

Our Docker file contains something in the line of :

COPY …/00_kube_ping.cli /opt/jboss/tools/cli/00_kube_ping.cli
...
RUN /opt/jboss/keycloak/bin/jboss-cli.sh --file=/opt/jboss/tools/cli/00_kube_ping.cli &&\
    rm -rf /opt/jboss/keycloak/standalone/configuration/standalone_xml_history

after that, it’s a matter of passing the right property at runtime to make jboss use the right jgroups stack… (-Djgroups.stack but i’m not sure). As you can see, we have added a lot of env. variables to fine tune the jgroups stack when trying to fix some issues we had with cluster failing, but the default values (extracted from jgroups source code) work…

This response is quite in line with what I’ve read in the documentation. Apparently, Keycloak - Blog - Keycloak Cluster Setup is a sort of official way to deploy clustered instances to Kubernetes, using JDBC_PING.

It makes use of a similar CLI script and configures the jgroups discovery protocol via an env variable JGROUPS_DISCOVERY_PROTOCOL=JDBC_PING. However, it does not explicitly describe the RUN .../JDBC_PING.cli addition to the Dockerfile.

From your post, it seems that this is necessary though (how else would the script run on startup?).

Since we are also using jib, which apparently does not support Docker RUN, I’ll have to figure out another way to do it … thank you.

So my setup looks like this, with 3 Kubernetes namespaces (ns1, ns2, ns3):

k8s ns1: keycloak-69d848fd65-c98b2 (1 pod)
k8s ns2: keycloak-657c95f845-zw7l9 (1 pod)
k8s ns3: keycloak-6d5cdf7848-j4j2p, keycloak-6d5cdf7848-z5f64 (2 pods)

I tried configuring the JGROUPS_DISCOVERY_PROTOCOL to be dns.DNS_PING, and after that I tried it with JDBC_PING from the blog post. The result is that Keycloak instances from different Kubernetes namespaces cluster up, which is not what I want:

15:11:04,755 INFO  [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000094: Received new cluster view for channel ejb: [keycloak-69d848fd65-c98b2|28] (3) [keycloak-69d848fd65-c98b2, keycloak-657c95f845-zw7l9, keycloak-6d5cdf7848-j4j2p]

However, the second instance keycloak-6d5cdf7848-z5f64 from the namespace ns3 does not appear in this cluster. What I would want is the two pods from ns3 to cluster up. Instead I get cross-namespace clustering. Why does that happen?