Sudden 401 Unauthorized Error When Querying Users with API

We are currently using Keycloak 18.0.0, with containers. We have a simple script that exports all users under a realm as a secondary backup measure. Script has been working without problem for seemingly forever, until today.

Today we restarted the containers (rolling restart). Same container image (we built these ourselves, version tagged), and after this exercise the backup script broke with 401 Unauthorized error. No other activities are impacted, just this script with its specific actions. Here are the things I’ve tried and verified:

  1. User is still valid, so is password.
  2. User role hasn’t changed, still has all the query* and view* roles from realm-management assigned. Even added realm-admin, still 401.
  3. Verified I can impersonate as this automation user, and able to see list of users from security console web UI.
  4. Verified authentication actually works (able to get an access token).

Our script basically does /openid-connect/token first with username / password for access token (this part is fine), then try to do /{realm}/users api calls (which gives 401). No obvious errors or exceptions that I can find.

Banging my head against the wall at this point, hopefully someone might have some ideas.

I found that if I clear all caches under the realm then it works “once”. After that it breaks again and goes back to 401. We are running two containers behind a load balancer with sticky connection. I am leaning towards restarting the both containers again, but I’d prefer not to since this is a production instance.

Hopefully someone has an idea on how I can deal with this caching issue.