Security Tip: Be Careful Of Transliteration

[Tip#15] Because we don't already have enough to worry about, without also needing to factor in other characters and emoji too...

Transliteration is the method of converting text from one set of characters to another, in a predictable way. Usually with the goal of maintaining readability1 in the resulting string while adding decorations2, or tricking readers into thinking the resulting string is the original string3.

For example:

One Ring to rule them all

Can be represented as:

Ⓞⓝⓔ Ⓡⓘⓝⓖ ⓣⓞ ⓡⓤⓛⓔ ⓣⓗⓔⓜ ⓐⓛⓛ

While this can look cool, there is a problem with it…

It turns out that MySQL automatically translates transliteration characters back into their originals when it performs a query. So if you performed a search for "Ⓕⓡⓞⓓⓞ", it would actually search for "Frodo".

I honestly had no idea this was a thing, so I tested this on our SQLi demo and it works!

Searching for "Ⓕⓡⓞⓓⓞ" matches "Frodo".

The fix for this is fairly trivial: transliterate the string back to a basic character set first. The PHP will then see the same thing as MySQL, allowing the rate limiter to properly catch it.

A PR was merged into Laravel 8, which added the Str::transliterate() helper, that you can use in situations where this is a problem to translate the characters back to their originals:

>>> Str::transliterate("Ⓕⓡⓞⓓⓞ");
=> "Frodo"

(Internally it uses the Portable ASCII package to perform the translations.)

So you’re probably thinking: “This is a cool exploit, but why do we need to know about it?”.

This is a clever bypass method that allows you to evade things like rate limiting, blocklists, content restrictions, and existence checks, as well as trick victims into miss-reading a string by using similar shaped characters.

Think about it, do you have anything like that in your apps where PHP performs a check against a string and the value is then passed into a MySQL query? Or where user-inputted strings are displayed to other users?

Even ignoring the MySQL automatic behaviour that makes the exploit work, there are many cases where you’d want to swap out special characters in your apps. Content moderation comes to mind immediately!

So keep Str::transliterate() in mind when you’re handling user input, and check things like rate limiting to ensure you’re not vulnerable.

  1. It’s important to note that “readability” usually doesn’t extend to blind and vision impaired people, in many cases due to the characters being less clear or screen readers being unable to translate.

  2. Very common on Twitter in display names.

  3. It’s common to generate phishing links and other scams with URLs that feature alternate characters that are easily confused.