Securing Laravel

Share this post

Security Tip: Validating HTML & Markdown Input

securinglaravel.com

Security Tip: Validating HTML & Markdown Input

[Tip#22] Validating user input is easy to forget without adding HTML or Markdown into the mix!

Stephen Rees-Carter
May 20, 2022
4
2
Share
Share this post

Security Tip: Validating HTML & Markdown Input

securinglaravel.com

Greetings my friends, I hope you’re all having an awesome week! The tip this week comes from a question a reader asked on the Validating User Input post from Nov last year. It was such a great question I wanted to share it - or more specifically the answer - with everyone.

Before we we into the security tip, I would love to welcome all of our new subscribers. It’s really awesome to have you here! 😁 I just checked the subscriber numbers and we’re almost at 900 subscribers

1
. I’d love to hit 900 by the end of May
2
, so please share Laravel Security in Depth with all of your developer friends.

Share Laravel Security In Depth


Validating HTML & Markdown Input

Let’s start with this question:

Curious as to what validation strategy can be applied when saving input from a text editor (ie ckeditor, quill) that is html markup?

My initial thinking is that it would have to be a regex pattern that includes all the tags that are enabled, or something along those lines.

I love the way they’re thinking about the output of the editor and how to protect against Cross-Site Scripting (XSS) attacks. It’s far too easy to assume that an editor like CKEditor won’t allow the user to submit an XSS payload (you can modify it in the browser), or that because Markdown isn’t HTML you can’t inject HTML (you can, and it’s even allowed in the spec!). So you very much do need to be thinking like this, and planning how to defend against XSS in your user input.

However, you can’t simply reach for something like a regex to solve the problem (can you ever “simply” use regex?). You’ll have a lot of trouble writing a regex to match all possible XSS payloads without also squashing legitimate tags. Just take a quick browse through this XSS cheat sheet and you’ll realise the wide scope of the task…

That said, the solution doesn’t have to be hard. This is such a common problem that it’s been solved many times before. 😁

HTML Purifier

If you’re receiving raw HTML from the user, then you can pass it through an HTML Purifier. They will deconstruct the HTML and strip out everything you haven’t specifically allowed, which allows you to be very specific with what you let your users use.

This is the one I’ve used before, and it seems to be by far the most popular one on Packagist: https://github.com/ezyang/htmlpurifier

Stripping HTML in Markdown

If you’re receiving Markdown, you should use a converter that includes the option to strip out all HTML when converting. This will save you having to pass the rendered Markdown into a purifier and double handing the data.

The one I recommend is CommonMark, which follows the CommonMark Spec.

It is important to note that part of the spec is that raw HTML is allowed, so you’ll want to read their security page and ensure you’re enabling the security features:

use League\CommonMark\CommonMarkConverter;

$converter = new CommonMarkConverter([
    'html_input' => 'escape', 
    'allow_unsafe_links' => false,
]);

echo $converter->convert('<script>alert("Hello XSS!");</script>');

// &lt;script&gt;alert("Hello XSS!");&lt;/script&gt;

Laravel’s Markdown Helper

Some of you may know that Laravel includes a Markdown helper in the String class (Str::markdown()), but you might not be aware that by default it does not strip out raw HTML.

Laravel uses CommonMark internally, so you’ll can just pass the converter options when you’re converting your HTML:

use Illuminate\Support\Str;
  
>>> Str::markdown('Inject: <script>alert("Hello XSS!");</script>', [
    'html_input' => 'strip',
    'allow_unsafe_links' => false,
]);

// <p>Inject: alert(&quot;Hello XSS!&quot;);</p> 

As you can see, it’s actually fairly easy to validate and clean raw HTML inputs with the right tools.

So you’ve got no excuse. 🤣

1

Counting both free and paid subscribers.

2

31st May will be 9 month from the launch of LSID, so hitting 900 subscribers by then really appeals to my love of patterns.

4
2
Share
Share this post

Security Tip: Validating HTML & Markdown Input

securinglaravel.com
Previous
Next
2 Comments
Ralph M. Rivera
May 20, 2022Liked by Stephen Rees-Carter

What is the difference between "validating" and "sanitizing?" I always understood "validation" as the process of confirming that the input data meets a series of standards while "santitizing" is the process of removing unwanted and potentially dangerous data while preserving safe data.

Expand full comment
Reply
1 reply by Stephen Rees-Carter
1 more comment…
Top
New
Community

No posts

Ready for more?

© 2023 Stephen Rees-Carter
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing