Keycloak email validation pattern

Hello everyone,

we are using Keycloak in our application and I was wondering if anybody knows what kind of email validation does Keycloak apply when creating a new user for example? I would be interested in the regex pattern (if there is any), as I would like to apply the same pattern to some other parts of the application as well.

2 Likes

Can anyone help with this?

I know this is an old question, but I figured out to get the email regular expression. In current version (22.0.5) you can look at org.keycloak.utils.EmailValidationUtil on server-spi-private project in case you want more detail. Keycloak executes two regex validations, one for the local part and the second for the domain. This are the regex:
LOCAL_PART_PATTERN: (?:[a-z0-9!#$%&'*+/=?^_{|}~\u0080-?\uFFFF-]+|"(?:[a-z0-9!#$%&'.(),<>[]:; @+/=?^_{|}~\u0080-?\uFFFF-]|\\\\|\\\")+")(?:\.(?:[a-z0-9!#$%&'*+/=?^_{|}~\u0080-?\uFFFF-]+|"(?:[a-z0-9!#$%&'.(),<>:; @+/=?^_{|}~\u0080-?\uFFFF-]|\\\\|\\\")+"))*

DOMAIN_PART_PATTERN: (?:[a-z\u0080-?\uFFFF0-9!#$%&'*+/=?^{|}~]-)[a-z\u0080-?\uFFFF0-9!#$%&'+/=?^{|}~]++(?:\.(?:[a-z\u0080-?\uFFFF0-9!#$%&'*+/=?^{|}~]-)[a-z\u0080-?\uFFFF0-9!#$%&'+/=?^{|}~]++)*|\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\]|\[IPv6:(?:(?:[0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,7}:|(?:[0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,5}(?::[0-9a-fA-F]{1,4}){1,2}|(?:[0-9a-fA-F]{1,4}:){1,4}(?::[0-9a-fA-F]{1,4}){1,3}|(?:[0-9a-fA-F]{1,4}:){1,3}(?::[0-9a-fA-F]{1,4}){1,4}|(?:[0-9a-fA-F]{1,4}:){1,2}(?::[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:(?:(?::[0-9a-fA-F]{1,4}){1,6})|:(?:(?::[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(?::[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(?:ffff(:0{1,4}){0,1}:){0,1}(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])|(?:[0-9a-fA-F]{1,4}:){1,4}:(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9]))\]

Couple of things to note:

  • this is executed in Java, "" is the escape character, so when there’s multiple "" together, I assume that at some point all will be replaced by just 1 ""
  • when you print by console \u0080 and \uFFFF in java, both produces invalid characters, I don’t know why are part of the regex

Regarding your questions: 0x80 is one of the UTF8 continuation bytes which are only valid in combination with a subset of follow-on bytes.

Character +FFFF is in the ‘specials’ code block, but this particular code is not valid.

I would also note that this regex appears to omit the many valid code points outside of what is termed the unicode basic plane (the first 65k characters), which means many characters people do use will be ignored. I don’t know if they are valid in domain names or local parts.