Our setup is suffering from what appears to be suboptimal database queries. I believe I can supply a PR that is an optimization for Keycloak, that would fix our problem, without being specific to it. Before I do so, I’d like for you to evaluate if the following is the right approach.
Context: we’re using Keycloak 10.0.2, with a user base that’s roughly 75,000 entities in size, using MySQL. Our application integrates with Keycloak through the keycloak-admin-client RESTful client.
We specifically see problems when operating on a single user identified by username. To obtain the user representation from Keycloak, a search is performed, using usersResource.search(username, null, null, null, 0, Integer.MAX_VALUE)
This seems to have the effect that the following query is executed (I’ve manually formatted the JPA-query for readability):
SELECT * FROM USER_ENTITY
WHERE REALM_ID='ourRealm' AND (lower(USERNAME) LIKE '%myUsername%')
ORDER BY USERNAME asc LIMIT 2147483647;
This seems to delegate to JpaUserProvider#searchForUser. Lines 867 through 871 seems to be where the magic lives, at least for version 10.0.2.
MySQL’s EXPLAIN output provides the following information for this query:
id: 1
select_type: SIMPLE
table: userentity0_
partitions: NULL
type: ref
possible_keys: UK_DYKN684SL8UP1CRFEI6ECKHD7,UK_RU8TT6T700S9V50BU18WS5HA6
key: UK_RU8TT6T700S9V50BU18WS5HA6
key_len: 258
ref: const
rows: 76446
filtered: 100.00
Extra: Using index condition
The ‘rows’ column indicates that for every query, all rows in the table are evaluated. This seems to be a direct results from two factors:
- the username search has been made to look for partial matches, by wrapping itself in % characters
- value comparison is done only after it has been transformed to lower-case (making the search case insensitive).
As this query is executed very often, the inefficiency is very noticable. In a parallel effort, we’re working on tweaking cache configuration to alleviate the problem.
KEYCLOAK-2343, for which a fix is introduced in 11.0.0, seems to address the first issue, by providing an alternative endpoint that allows for an exact
boolean flag to be provided. However, this in itself will still not prevent our instance of MySQL to evaluate all rows. The case-sensitivity workaround transforming column value and input value to lowercase still is an issue.
To work around this issue, database-specific fixes could probably be used, such as using a case-insensitive COLLATE definition for the column. I appreciate that this might be a solution that’s both to specific to MySQL (as Keycloak supports multiple DBMSses) as well as having a high risk associated to it (it might affect unrelated functionality unpredictably).
I’ve noticed that in various other places in code (such as the @NamedQuery
getRealmUserByUsername
in org.keycloak.models.jpa.entities.UserEntity
) that the content of the USER_ENTITY.USERNAME
column is assumed to be lower-case. In various places, only the provided WHERE value is transformed to lower-case.
If indeed the value in the database can be assumed to be lower-case, then I’d suggest to improve the fix for KEYCLOAK-2343 further, by removing the lower()
references on the column (but retain them on the search value). This probably should happen in a couple of places, but to illustrate, I’ll use lines 807-816 of today’s version of JpaUserProvider.java on the main branch. This currently is implemented like this:
case USERNAME:
case FIRST_NAME:
case LAST_NAME:
case EMAIL:
if (Boolean.valueOf(attributes.getOrDefault(UserModel.EXACT, Boolean.FALSE.toString()))) {
predicates.add(builder.equal(builder.lower(root.get(key)), value.toLowerCase()));
} else {
predicates.add(builder.like(builder.lower(root.get(key)), "%" + value.toLowerCase() + "%"));
}
break;
I suggest to leave this as-is, but use a different implementation only for the USERNAME case:
case USERNAME:
if (Boolean.valueOf(attributes.getOrDefault(UserModel.EXACT, Boolean.FALSE.toString()))) {
predicates.add(builder.equal(root.get(key), value.toLowerCase()));
} else {
predicates.add(builder.like(root.get(key), "%" + value.toLowerCase() + "%"));
}
break;
case FIRST_NAME:
case LAST_NAME:
case EMAIL:
if (Boolean.valueOf(attributes.getOrDefault(UserModel.EXACT, Boolean.FALSE.toString()))) {
predicates.add(builder.equal(builder.lower(root.get(key)), value.toLowerCase()));
} else {
predicates.add(builder.like(builder.lower(root.get(key)), "%" + value.toLowerCase() + "%"));
}
break;
I’d be happy to put in the work if you believe that this is the right approach. What do you think? Am I on the right track here, or is a better approach available?