

Discover more from Securing Laravel
Security Tip: Validating HTML & Markdown Input
[Tip#22] Validating user input is easy to forget without adding HTML or Markdown into the mix!
Greetings my friends, I hope you’re all having an awesome week! The tip this week comes from a question a reader asked on the Validating User Input post from Nov last year. It was such a great question I wanted to share it - or more specifically the answer - with everyone.
Before we we into the security tip, I would love to welcome all of our new subscribers. It’s really awesome to have you here! 😁 I just checked the subscriber numbers and we’re almost at 900 subscribers1. I’d love to hit 900 by the end of May2, so please share Laravel Security in Depth with all of your developer friends.
Validating HTML & Markdown Input
Let’s start with this question:
Curious as to what validation strategy can be applied when saving input from a text editor (ie ckeditor, quill) that is html markup?
My initial thinking is that it would have to be a regex pattern that includes all the tags that are enabled, or something along those lines.
I love the way they’re thinking about the output of the editor and how to protect against Cross-Site Scripting (XSS) attacks. It’s far too easy to assume that an editor like CKEditor won’t allow the user to submit an XSS payload (you can modify it in the browser), or that because Markdown isn’t HTML you can’t inject HTML (you can, and it’s even allowed in the spec!). So you very much do need to be thinking like this, and planning how to defend against XSS in your user input.
However, you can’t simply reach for something like a regex to solve the problem (can you ever “simply” use regex?). You’ll have a lot of trouble writing a regex to match all possible XSS payloads without also squashing legitimate tags. Just take a quick browse through this XSS cheat sheet and you’ll realise the wide scope of the task…
That said, the solution doesn’t have to be hard. This is such a common problem that it’s been solved many times before. 😁
HTML Purifier
If you’re receiving raw HTML from the user, then you can pass it through an HTML Purifier. They will deconstruct the HTML and strip out everything you haven’t specifically allowed, which allows you to be very specific with what you let your users use.
This is the one I’ve used before, and it seems to be by far the most popular one on Packagist: https://github.com/ezyang/htmlpurifier
Stripping HTML in Markdown
If you’re receiving Markdown, you should use a converter that includes the option to strip out all HTML when converting. This will save you having to pass the rendered Markdown into a purifier and double handing the data.
The one I recommend is CommonMark, which follows the CommonMark Spec.
It is important to note that part of the spec is that raw HTML is allowed, so you’ll want to read their security page and ensure you’re enabling the security features:
use League\CommonMark\CommonMarkConverter;
$converter = new CommonMarkConverter([
'html_input' => 'escape',
'allow_unsafe_links' => false,
]);
echo $converter->convert('<script>alert("Hello XSS!");</script>');
// <script>alert("Hello XSS!");</script>
Laravel’s Markdown Helper
Some of you may know that Laravel includes a Markdown helper in the String class (Str::markdown()
), but you might not be aware that by default it does not strip out raw HTML.
Laravel uses CommonMark internally, so you’ll can just pass the converter options when you’re converting your HTML:
use Illuminate\Support\Str;
>>> Str::markdown('Inject: <script>alert("Hello XSS!");</script>', [
'html_input' => 'strip',
'allow_unsafe_links' => false,
]);
// <p>Inject: alert("Hello XSS!");</p>
As you can see, it’s actually fairly easy to validate and clean raw HTML inputs with the right tools.
So you’ve got no excuse. 🤣
Counting both free and paid subscribers.
31st May will be 9 month from the launch of LSID, so hitting 900 subscribers by then really appeals to my love of patterns.
Security Tip: Validating HTML & Markdown Input
What is the difference between "validating" and "sanitizing?" I always understood "validation" as the process of confirming that the input data meets a series of standards while "santitizing" is the process of removing unwanted and potentially dangerous data while preserving safe data.