Running KC 12.0.4 from docker on AWS ECS Fargate. I’m starting 2 instances in a cluster, in different AZs. I bring up 2 new instances before bringing down the old 2 instances. I have a problem where I intermittently see the following on startup:
09:05:23,380 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 0
09:05:26,383 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 1
09:05:29,385 WARN [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) ip-10-0-53-211: JOIN(ip-10-0-53-211) sent to ip-10-0-46-90 timed out (after 3000 ms), on try 2
....
Each node does this 10 times trying to reach the other node, and then reverts to standalone mode. However, sometimes it works perfectly, making me think it’s not an AWS network problem. Has anyone else encountered this problem?
Because the automatic private IP address wasn’t reachable over the network, we saw the above JGroups problem. We solved it by overriding the docker entrypoint and setting the BIND variable.
#!/bin/bash
TOBIND=$(hostname --all-ip-addresses)
# e.g. it's normally returning TOBIND="10.1.1.1 169.254.1.1"
for IP in $TOBIND
do
if [[ $IP != 169* ]] ; # Get rid of 169.x.x.x
then
BIND+=" $IP "
fi
done
echo selecting ip addresses $BIND
BIND=$BIND ./opt/jboss/tools/docker-entrypoint.sh
I don’t know why we had this problem, and I haven’t heard of it more from others that are running keycloak on AWS/Fargate. We did several tests with different network setups, and always had the hostname.... command returning a 169.254.x.x address.