Standalone-HA Clustering Configuration Fails!

I am trying to cluster two instances of Keycloak (v8.0.1) over two VM’s in Digital Ocean. I am using Private IP addresses to bind my KC instance and enable clustering. Using Ubuntu 18.04 and OpenJDK 8.

My setup looks as follows:

Node-1:
KC + PostgreSQL DB + NGINX
Private IP: 10.133.87.27

Node-2:
KC (uses PostgreSQL DB on Node1)
Private IP: 10.133.117.155

standalone-ha.xml:

 <subsystem xmlns="urn:jboss:domain:jgroups:7.0">
        <channels default="ee">
            <channel name="ee" stack="tcp" cluster="ejb"/>
        </channels>
        <stacks>
                <stack name="tcp">
                <transport type="TCP" socket-binding="jgroups-tcp"/>
                <!--<protocol type="JDBC_PING">
                    <property name="datasource_jndi_name">java:jboss/datasources/KeycloakDS</property>
                    <property name="initialize_sql">
                            CREATE TABLE IF NOT EXISTS JGROUPSPING (
                            own_addr VARCHAR(200) NOT NULL,
                            cluster_name VARCHAR(200) NOT NULL,
                            created TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
                            ping_data BYTEA,
                            CONSTRAINT PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name)
                            )
                    </property>
                </protocol>-->
                <protocol type="TCPPING">
                    <property name="initial_hosts">10.133.117.155[8600],10.133.87.27[8600]</property>
                    <property name="port_range">1000</property>
                </protocol>
                <socket-protocol type="MPING" socket-binding="jgroups-mping"/>
                <protocol type="MERGE3"/>
                <socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
                <protocol type="FD_ALL"/>
                <protocol type="VERIFY_SUSPECT"/>
                <protocol type="pbcast.NAKACK2"/>
                <protocol type="UNICAST3"/>
                <protocol type="pbcast.STABLE"/>
                <protocol type="pbcast.GMS">
                    <property name="join_timeout">30000</property>
                </protocol>
                <protocol type="MFC"/>
                <protocol type="FRAG3"/>
            </stack>
        </stacks>
    </subsystem>

 <interfaces>
    <interface name="management">
        <inet-address value="${jboss.bind.address.management:127.0.0.1}"/>
    </interface>
    <interface name="private">
        <inet-address value="${jboss.bind.address.private:127.0.0.1}"/>
    </interface>
    <interface name="public">
        <inet-address value="${jboss.bind.address:127.0.0.1}"/>
    </interface>
    <interface name="eth1">
        <nic name="eth1"/>
    </interface>
</interfaces>

<socket-binding-group name="standard-sockets" default-interface="public" port-offset="${jboss.socket.binding.port-offset:0}">
    <socket-binding name="ajp" port="${jboss.ajp.port:8009}"/>
    <socket-binding name="http" port="${jboss.http.port:8080}"/>
    <socket-binding name="https" port="${jboss.https.port:8443}"/>
    <socket-binding name="proxy-https" port="${jboss.proxy-https.port:443}"/>
    <socket-binding name="jgroups-mping" interface="private" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="45700"/>
    <socket-binding name="jgroups-tcp" interface="eth1" port="7600"/>
    <socket-binding name="jgroups-tcp-fd" interface="private" port="57600"/>
    <socket-binding name="jgroups-udp" interface="private" port="55200" multicast-address="${jboss.default.multicast.address:230.0.0.4}" multicast-port="45688"/>
    <socket-binding name="jgroups-udp-fd" interface="private" port="54200"/>
    <socket-binding name="management-http" interface="management" port="${jboss.management.http.port:9990}"/>
    <socket-binding name="management-https" interface="management" port="${jboss.management.https.port:9993}"/>
    <socket-binding name="modcluster" multicast-address="${jboss.modcluster.multicast.address:224.0.1.105}" multicast-port="23364"/>
    <socket-binding name="txn-recovery-environment" port="4712"/>
    <socket-binding name="txn-status-manager" port="4713"/>
    <outbound-socket-binding name="mail-smtp">
        <remote-destination host="localhost" port="25"/>
    </outbound-socket-binding>
</socket-binding-group>

KC Launch Node-1:
keycloak-8.0.1/bin/standalone.sh --server-config=standalone-ha.xml -b 10.133.87.27 -Djboss.bind.address.private=10.133.87.27 -bmanagement 10.133.87.27 -Djboss.socket.binding.port-offset=1000 -Djboss.server.name=kc-node1

KC Launch Node-2:
keycloak-8.0.1/bin/standalone.sh --server-config=standalone-ha.xml -b 10.133.117.155 -Djboss.bind.address.private=10.133.117.155 -bmanagement 10.133.117.155 -Djboss.socket.binding.port-offset=1000 -Djboss.server.name=kc-node2

Logs Node-1:
2019-12-31 11:03:03,214 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) address=kc-node1, cluster=ejb, physical address=10.133.87.27:8600 2019-12-31 11:03:03,214 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) address=kc-node1, cluster=ejb, physical address=10.133.87.27:8600 2019-12-31 11:03:33,218 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: no members discovered after 30003 ms: creating cluster as first member 2019-12-31 11:03:33,218 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: no members discovered after 30003 ms: creating cluster as first member 2019-12-31 11:03:33,220 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: installing view [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,220 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: installing view [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,241 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: created cluster (first member). My view is [kc-node1|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl 2019-12-31 11:03:33,241 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node1: created cluster (first member). My view is [kc-node1|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl 2019-12-31 11:03:33,621 INFO [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-8) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2' 9.4.16.Final 2019-12-31 11:03:33,853 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:03:33,853 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-8) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:03:33,853 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:03:33,855 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:03:33,855 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:03:33,861 INFO [org.infinispan.CLUSTER] (MSC service thread 1-7) ISPN000094: Received new cluster view for channel ejb: [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,861 INFO [org.infinispan.CLUSTER] (MSC service thread 1-5) ISPN000094: Received new cluster view for channel ejb: [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,862 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,862 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,865 INFO [org.infinispan.CLUSTER] (MSC service thread 1-8) ISPN000094: Received new cluster view for channel ejb: [kc-node1|0] (1) [kc-node1] 2019-12-31 11:03:33,872 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000079: Channel ejb local address is kc-node1, physical addresses are [10.133.87.27:8600] 2019-12-31 11:03:33,880 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-7) ISPN000079: Channel ejb local address is kc-node1, physical addresses are [10.133.87.27:8600] 2019-12-31 11:03:33,884 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000079: Channel ejb local address is kc-node1, physical addresses are [10.133.87.27:8600] 2019-12-31 11:03:33,888 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-5) ISPN000079: Channel ejb local address is kc-node1, physical addresses are [10.133.87.27:8600] 2019-12-31 11:03:33,895 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-8) ISPN000079: Channel ejb local address is kc-node1, physical addresses are [10.133.87.27:8600]

Logs Node-2:
2019-12-31 11:00:35,637 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) address=kc-node2, cluster=ejb, physical address=10.133.117.155:8600 2019-12-31 11:00:35,637 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) address=kc-node2, cluster=ejb, physical address=10.133.117.155:8600 2019-12-31 11:01:05,651 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: no members discovered after 30009 ms: creating cluster as first member 2019-12-31 11:01:05,651 INFO [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: no members discovered after 30009 ms: creating cluster as first member 2019-12-31 11:01:05,655 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: installing view [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:05,655 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: installing view [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:05,679 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: created cluster (first member). My view is [kc-node2|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl 2019-12-31 11:01:05,679 DEBUG [org.jgroups.protocols.pbcast.GMS] (ServerService Thread Pool -- 60) kc-node2: created cluster (first member). My view is [kc-node2|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl 2019-12-31 11:01:06,176 INFO [org.infinispan.factories.GlobalComponentRegistry] (MSC service thread 1-4) ISPN000128: Infinispan version: Infinispan 'Infinity Minus ONE +2' 9.4.16.Final 2019-12-31 11:01:06,463 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:01:06,464 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:01:06,465 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:01:06,481 INFO [org.infinispan.CLUSTER] (MSC service thread 1-3) ISPN000094: Received new cluster view for channel ejb: [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:06,482 INFO [org.infinispan.CLUSTER] (MSC service thread 1-1) ISPN000094: Received new cluster view for channel ejb: [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:06,482 INFO [org.infinispan.CLUSTER] (MSC service thread 1-2) ISPN000094: Received new cluster view for channel ejb: [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:06,495 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:01:06,495 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:06,503 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) ISPN000079: Channel ejb local address is kc-node2, physical addresses are [10.133.117.155:8600] 2019-12-31 11:01:06,520 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-1) ISPN000079: Channel ejb local address is kc-node2, physical addresses are [10.133.117.155:8600] 2019-12-31 11:01:06,530 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-3) ISPN000079: Channel ejb local address is kc-node2, physical addresses are [10.133.117.155:8600] 2019-12-31 11:01:06,536 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000079: Channel ejb local address is kc-node2, physical addresses are [10.133.117.155:8600] 2019-12-31 11:01:06,610 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000078: Starting JGroups channel ejb 2019-12-31 11:01:06,611 INFO [org.infinispan.CLUSTER] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel ejb: [kc-node2|0] (1) [kc-node2] 2019-12-31 11:01:06,613 INFO [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000079: Channel ejb local address is kc-node2, physical addresses are [10.133.117.155:8600]

I’ve tried all the suggestions available on mail-list, forums, etc:

  1. Not using UDP based discovery (cloud environment)
  2. Using TCPPING
  3. Using JDBC_PING (visible in standalone-ha.xml above as commented block)
  4. Binding jgroups-tcp to physical address of ethernet (visible in interfaces block)
  5. Specifying actual IP address for binding KC using -b switch (also tried binding to public IP address with same result.)
  6. TELNET both ways over port 8600 works fine
  7. SQL client connection from Node-2 to PostgreSQL on Node-1 works fine
  8. Disabled UFW entirely

Having run out of options, I am presenting this case to the forum to gain some insights on what maybe causing issues with both nodes discovering each other…?

Hi,

Just in case if somebody wishes to check netstat output (trimmed to list only interested ports), it’s here:

Node-1

netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 10.133.87.27:8600       0.0.0.0:*               LISTEN     
tcp        0      0 127.0.0.1:5432          0.0.0.0:*               LISTEN     
tcp        0      0 10.133.87.27:5432       0.0.0.0:*               LISTEN     
tcp        0      0 10.133.87.27:57600      0.0.0.0:*               LISTEN     
tcp        0      0 10.133.87.27:5432       10.133.87.27:35492      ESTABLISHED
tcp        0      0 10.133.87.27:35492      10.133.87.27:5432       ESTABLISHED
tcp        0      0 10.133.87.27:5432       10.133.117.155:40566    ESTABLISHED
tcp        0      0 10.133.87.27:5432       10.133.87.27:35496      ESTABLISHED
tcp        0      0 10.133.87.27:35494      10.133.87.27:5432       ESTABLISHED
tcp        0      0 10.133.87.27:5432       10.133.117.155:40570    ESTABLISHED
tcp        0      0 10.133.87.27:5432       10.133.117.155:40568    ESTABLISHED

Node-2

netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 10.133.117.155:8600     0.0.0.0:*               LISTEN     
tcp        0      0 10.133.117.155:57600    0.0.0.0:*               LISTEN     
tcp        0      0 10.133.117.155:40566    10.133.87.27:5432       ESTABLISHED
tcp        0      0 10.133.117.155:40570    10.133.87.27:5432       ESTABLISHED
tcp        0      0 10.133.117.155:40568    10.133.87.27:5432       ESTABLISHED

Hi,

Good news. I was able to resolve the issue. The nodes can now discover and connect to form a cluster!

I had to comment out the MPING config from stack:

<stack name="tcp">
                <transport type="TCP" socket-binding="jgroups-tcp"/>
                <protocol type="TCPPING">
                    <property name="initial_hosts">10.133.117.155[8600],10.133.87.27[8600]</property>
                    <property name="port_range">0</property>
                </protocol>

                <!--<socket-protocol type="MPING" socket-binding="jgroups-mping"/>-->
                <protocol type="MERGE3"/>
                <socket-protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
                <protocol type="FD_ALL"/>
                <protocol type="VERIFY_SUSPECT"/>
                <protocol type="pbcast.NAKACK2"/>
                <protocol type="UNICAST3"/>
                <protocol type="pbcast.STABLE"/>
                <protocol type="pbcast.GMS">
                    <property name="join_timeout">30000</property>
                </protocol>
                <protocol type="MFC"/>
                <protocol type="FRAG3"/>
            </stack>

It seems without doing it, MPING doesn’t allow TCPPING to operate and discover other node(s). This seems like a bug to me. Either it needs to be clearly documented that only one discovery protocol should be present in config or else the JGROUPS should function where control from one discovery protocol flows to other like a chain when multiple discovery protocols are present in config.

I hope this helps somebody some day, thanks.

2 Likes

Hi @sanketd.kc,

Currently I’am configuring the keycloak for the production. For this I want to run keycloak in cluster mode using TCPPING. I have tried the same as above you mentioned. But the problem is the two keycloak running on different host servers cannot discover each other.I have also opened up all ports between them for accessing on the AWS. Can you help me on this?

Would this configuration apply to KeyCloak 11.0.2 as well.
I have similar setup with Standalone-HA with a physical loadbalancer and MySQL datasource.

Help me understand… I’m not really good with clustering. When you say private interface, your instances have two NICs? One for client and external connections and second one for these servers to communicate with each other?

I did Jgroup config but I used public interface IP in the tcp ping.
Also my nodes aren’t aware of the tokens issues issued by the other nodes. I see WARN messages about datasource connection getting destroyed on one of the nodes.

2020-10-31 21:38:07,979 WARN [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] (Timer-2) IJ000621: Destroying connection that could not be validated: org.jboss.jca.core.connectionmanager.listener.TxConnectionListener@15d28c4e[state=NORMAL managed connection=org.jboss.jca.adapters.jdbc.local.LocalManagedConnection@3d559a6b connection handles=0 lastReturned=1604193789095 lastValidated=1604193788053 lastCheckedOut=1604193789088 trackByTx=false pool=org.jboss.jca.core.connectionmanager.pool.strategy.OnePool@523baa80 mcp=SemaphoreConcurrentLinkedQueueManagedConnectionPool@1c1a2ce9[pool=KeycloakDS] xaResource=LocalXAResourceImpl@36d86162[connectionListener=15d28c4e
connectionManager=2b63566b warned=false currentXid=null productName=MySQL productVersion=8.0.20 jndiName=java:/jboss/datasources/KeycloakDS] txSync=null]

Any help is greatly appreciated.