Security Tip: Escape Output with e(), htmlspecialchars(), & htmlentities()!

[Tip#64] Do you know the difference between `e()`, `htmlspecialchars()`, & `htmlentities()`? Can we just use `e()` for everything?

Security Tip: Escape Output with e(), htmlspecialchars(), & htmlentities()!

⚠️ Want me to hack into your app and tell you how I did it, so you can fix any vulnerabilities before someone else finds them? Book in a Laravel Security Audit and Penetration Test! 🕵️


Last week we looked at when to use `strip_tags()`1 (spoiler: only when outputting outside any attributes or complex structures), and now it’s time to look at the other methods in that collection: `htmlspecialchars()`2, and `htmlentities()`3. Let’s also include Laravel’s `e()`4 because it’s closely related - I’ll explain why shortly.

To explain what these functions do, I’ll encode the same string in both and you can compare the outputs.

> $string = 'Hello <img src="x" onerror="alert(\'Boom!\')"> World!';
= "Hello <img src="x" onerror="alert('Boom!')"> World!"

> htmlspecialchars($string);
= "Hello &lt;img src=&quot;x&quot; onerror=&quot;alert(&#039;Boom!&#039;)&quot;&gt; World!"

> htmlentities($string);
= "Hello &lt;img src=&quot;x&quot; onerror=&quot;alert(&#039;Boom!&#039;)&quot;&gt; World!"

Notice any differences? Nope, I don’t either.

In fact, it’s only when we use some extra characters that we notice the difference:

> $string = " ¡¢£¥§©«®°¶·»¼½¾¿™";
= " ¡¢£¥§©«®°¶·»¼½¾¿™"

> htmlspecialchars($string);
= " ¡¢£¥§©«®°¶·»¼½¾¿™"

> htmlentities($string);
= " &iexcl;&cent;&pound;&yen;&sect;&copy;&laquo;&reg;&deg;&para;&middot;&raquo;&frac14;&frac12;&frac34;&iquest;&trade;"

`htmlentities()` will encode every special character with it’s corresponding HTML entity, while `htmlspecialchars()` will only encode a primary subset that’s commonly used for Cross-Site Scripting (XSS) attacks in HTML output: & " ' < >

Note, this isn’t list isn’t exhaustive or perfect - there are situations were it’s not suitable. I’ll cover this at the end.

If you’re unfamiliar with HTML entities, they are special code that the web browser understands and automatically translates into the special character upon display.
As an example, `htmlspecialchars()` encodes these ones:
& → &amp;
" → &quot;
' → &#039;
< → &lt;
> → &gt;

So where does `e()` fit into this?

Among a few other helpers5, Laravel’s `e()` method uses `htmlspecialchars()` and is the escaping method Laravel uses everywhere. Laravel also sets some sane defaults and provides a wrapper with some added functionality for us to use. This makes `e()` suitable for use in most places (see below) where you need escaping in to HTML output.

My recommendation is to use `e()` any time you need to escape output manually into HTML.

That said, you can use any of these three methods, where it makes sense to do so. I like `e()` because of it’s extra functions and shortness, but `htmlspecialchars()` does the job too, and if you do need to encode extra characters, then you should reach for `htmlentities()`.

So why don’t we see `e()` used throughout our Blade templates? Technically we do! Blade’s escaping tags, `{{ ... }}`, use `e()` in the background to escape the output, making it safe. So you only need to reach for `e()` or the others outside of Blade, or within complex structures that cannot be output through `{{ ... }}`.

IMPORTANT UPDATE (7th Dec)

The original version of this article implied that `e()` and `htmlspecialchars()` were safe to use for output escaping anywhere. This is simply not true, and I shouldn’t have worded it like that. (Huge thanks for Paul Moore for calling me out on this!)

Escaping output always requires an understanding of the context it’s being outputted into. If you’re outputting into HTML outside tags, then ` < > & ` are your biggest concern, inside attributes and you’ve got to deal with ` " ' `. What about inside Javascript template strings - suddenly we’ve got backticks ( ` ) and dollar signs ( $ ) to consider too. What about a custom templating engine that, maybe it looks for ` {{ ... }} ` that could be included in user output to bypass Blade…

My point is, escaping depends on context. If you blindly escape output using a common function without considering the context you’re outputting to, there is a chance you’ll leave an injection vector waiting for someone to find.

Before we finish up, I just need to make a quick note about PHP 8.0 and earlier versions!

In PHP 8.0 and earlier, both `htmlspecialchars()` and `htmlentities()` ignored single quotes and only escaped double quotes as per the default flag of `ENT_COMPAT`. This default was changed in PHP 8.1 to `ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401`, which encodes both quote types by default.

This opens up XSS opportunities if single quotes are used in HTML where variables are being injected, and as such if your environment is running PHP 8.0 or earlier, you cannot use `htmlspecialchars()` or `htmlentities()` safely unless you manually set the `ENT_QUOTES` flag.

Note that `e()` has included this flag since the helper function was first added to the framework, 11 years ago - so I’d recommend sticking with `e()`.


Looking to learn more?
Security Tip #45: Replace Simple Dependencies
▶️ In Depth #15: Mass-Assignment Vulnerabilities