JGroups TCP timeout in AWS ECS

Hello,

Running KC 12.0.4 from docker on AWS ECS Fargate. I’m starting 2 instances in a cluster, in different AZs. I bring up 2 new instances before bringing down the old 2 instances. I have a problem where I intermittently see the following on startup:

09:05:23,380 WARN  [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 0
09:05:26,383 WARN  [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 1
09:05:29,385 WARN  [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 2
....

Each node does this 10 times trying to reach the other node, and then reverts to standalone mode. However, sometimes it works perfectly, making me think it’s not an AWS network problem. Has anyone else encountered this problem?

Thanks!

2 Likes

@xgp did you get the solution for this issue?

Hi @atulchauhan01. I found that our specific problem was that keycloak/jboss was intermittently binding to an automatic private IP address (169.254.x.x) because of the IP addresses being returned by hostname --all-ip-addresses in keycloak-containers/docker-entrypoint.sh at main · keycloak/keycloak-containers · GitHub

Because the automatic private IP address wasn’t reachable over the network, we saw the above JGroups problem. We solved it by overriding the docker entrypoint and setting the BIND variable.

#!/bin/bash

TOBIND=$(hostname --all-ip-addresses)
# e.g. it's normally returning TOBIND="10.1.1.1 169.254.1.1"
for IP in $TOBIND
do
    if [[ $IP != 169* ]] ; # Get rid of 169.x.x.x
    then
	BIND+=" $IP "
    fi
done

echo selecting ip addresses $BIND
BIND=$BIND ./opt/jboss/tools/docker-entrypoint.sh

I don’t know why we had this problem, and I haven’t heard of it more from others that are running keycloak on AWS/Fargate. We did several tests with different network setups, and always had the hostname.... command returning a 169.254.x.x address.

Hope that helps.

2 Likes