Keycloak sporadiclly restart in kubernetes context

Hello

We have keycloak with 1-3 replicas running in k8s cluster in different namespaces. Sometimes happend that keycloak pods are beeing restarted and it is happening for long period of time.

NAME                                               READY   STATUS        RESTARTS   AGE
srsng-infra-keycloak-0                             1/1     Running       25         5h2m
srsng-infra-keycloak-1                             1/1     Running       48         5h15m

Also when we look at memory, we see something like this:

image

While in some namespaces keycloak replicas are stable and memory is flat line.

From the logs we can see just this:

2022-09-01T12:04:53+02:00 10:04:53,525 INFO  [org.jboss.as.server] (Thread-1) WFLYSRV0272: Suspending server
2022-09-01T12:04:53+02:00 *** JBossAS process (606) received TERM signal ***
2022-09-01T12:04:53+02:00 10:04:53,526 INFO  [org.jboss.as.ejb3] (Thread-1) WFLYEJB0493: Jakarta Enterprise Beans subsystem suspension complete
2022-09-01T12:04:53+02:00 10:04:53,528 INFO  [org.jboss.as.server] (Thread-1) WFLYSRV0220: Server shutdown has been requested via an OS signal
2022-09-01T12:04:53+02:00 10:04:53,666 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-1) WFLYJCA0010: Unbound data source [java:jboss/datasources/KeycloakDS]
2022-09-01T12:04:53+02:00 10:04:53,670 INFO  [org.infinispan.manager.DefaultCacheManager] (ServerService Thread Pool -- 66) Stopping cache manager null on srsng-infra-keycloak-0
2022-09-01T12:04:53+02:00 10:04:53,670 INFO  [org.infinispan.manager.DefaultCacheManager] (ServerService Thread Pool -- 76) Stopping cache manager null on srsng-infra-keycloak-0
2022-09-01T12:04:53+02:00 10:04:53,671 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 66) ISPN000080: Disconnecting JGroups channel ejb
2022-09-01T12:04:53+02:00 10:04:53,671 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 76) ISPN000080: Disconnecting JGroups channel ejb
2022-09-01T12:04:53+02:00 10:04:53,673 INFO  [org.infinispan.manager.DefaultCacheManager] (ServerService Thread Pool -- 78) Stopping cache manager null on srsng-infra-keycloak-0
2022-09-01T12:04:53+02:00 10:04:53,673 INFO  [org.jboss.as.mail.extension] (MSC service thread 1-1) WFLYMAIL0002: Unbound mail session [java:jboss/mail/Default]
2022-09-01T12:04:53+02:00 10:04:53,674 INFO  [org.infinispan.CLUSTER] (ServerService Thread Pool -- 78) ISPN000080: Disconnecting JGroups channel ejb
2022-09-01T12:04:53+02:00 10:04:53,675 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0008: Undertow HTTPS listener https suspending
2022-09-01T12:04:53+02:00 10:04:53,676 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-2) WFLYUT0007: Undertow HTTPS listener https stopped, was bound to 0.0.0.0:8443

Any help is welcomed, becouse I dont even know where I start to do something with the issue.

This line gives you a hint that the container is being stopped by kubernetes. This can be caused by a manual or automatic processes.

Manual would be someone (or a script) deleting the pod.

Automatic can be health probes, pod eviction (in case of a faulty node) (seems unlikely).

I’d take a look at the kubernetes events for that namespace to see if that is a healthcheck issue. You’d get a line informing for probe problems in that pod (kubectl describe pod will bring that information too).

If it is, you can try to change the healthcheck parameters (probably the response timeout) to reduce the frequency of the problem.