

Are you also validating a JavaScript URL using RegEx?
source link: https://www.lirantal.com/blog/2022-10-28_are_you_validating_javascript_url_safely
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Oct 28, 2022 ~ 4 min read
Are you also validating a JavaScript URL using RegEx?

What do you think of the following JavaScript URL validation function code? Are you accidentally adding security issues while trying to build a feature?
The Snyk blog features a Secure JavaScript URL validation article about the importance of security traits and secure best practices with regards to handling a JavaScript URL.
I shared the following code snippet on Twitter to see folks make of it and whether someone would be calling out security issues:
function checkUrlIsValid (string) {
let givenURL ;
try {
givenURL = new URL (string);
} catch (error) {
console.log ("error is", error);
return false;
}
return true;
}
The replies varied pretty much and included some interesting perspectives and potential security vectors that folks might not be aware of, which we will cover in this article. But standing out were replies that suggested to use regular expressions (RegEx) to validate a URL.
Using a RegEx to perform validation isn’t new, and in fact, an often used approach by developers when they need to perform string matching or string manipulation. In fact, even the popular validator npm package uses RegEx to validate data formats in strings. But is it the right approach? What sort of security concerns does RegEx exposes us to? Let’s find out.
Regular Expression Denial of Service
Due to how some RegEx engines work, they can be vulnerable to a type of attack called Regular Expression Denial of Service (ReDoS). This happens because of an implementation detail in the RegEx engine that is known as catastrophic backtracking.
The fact that escapes most when dealing with Regular Expressions is that RegEx expressions are CPU-bound.
For JavaScript and Node.js, both being single-threaded environments for the main event loop that handles runtime JavaScript code, this would be disastrous. A ReDoS attack can cause a Node.js process to completely halt and stop responding to any HTTP requests.
Consider the following function that uses a RegEx to validate a URL:
function checkUrlIsValidFast (string) {
var ip = '(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])(?:\\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])){3}';
var protocol = '(?:http(s?)\:\/\/)?';
var auth = '(?:\\S+(?::\\S*)?@)?';
var host = '(?:(?:[a-z\\u00a1-\\uffff0-9_]-*)*[a-z\\u00a1-\\uffff0-9]+)';
var domain = '(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*';
var tld = '(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\\.?';
var port = '(?::\\d{2,5})?';
var path = '(?:[/?#][^\\s"]*)?';
var regex = '(?:' + protocol + '|www\\.)' + auth + '(?:localhost|' + ip + '|' + host + domain + tld + ')' + port + path;
return new RegExp(regex, 'ig').test(string);
}
console.log(checkUrlIsValidFast('https://example.com'))
// returns true
The regular expression validation above looks great, right?
Well, it’s not. It’s vulnerable to a ReDoS attack. Let’s see how. What if the attacker sends the following string as input for a URL:
console.log(checkUrlIsValidFast('018137.113.215.4074.138.129.172220.179.206.94180.213.144.175250.45.147.1364868726sgdm6nohQ'))
// returns true
// but after like a million years.
// goodluck ;-)
Your npm package validator is vulnerable to ReDoS
My personal advice when I’m asked about how to handle RegEx in situation where you need to validate a string is to avoid it completely if you can and use lower-order string manipulation functions instead.
The reason is that RegEx is a very powerful tool, but it’s also very complicated and can be very hard to get right. If you want some supporting evidence, I can offer at least two:
- Cloudflare, an incredibly big Internet infrastructure provider, had suffered one if its biggest outages in history in 2016 due to a ReDoS vulnerability in one of their RegEx.
- The validator npm package, which is used by millions of developers, has been found vulnerable to ReDoS time and time again.
If smart maintainers, many collaborators, and talented developers employed by Fortune500 public companies can’t get RegEx right, how can we expect the average developer to do so?
What else to worry about when validating URLs?
Other security aspects to consider when validating URLs:
- Mike Samuel had offered advice about normalizing URLs in order to avoid different sort of injection payloads.
- Gal Weizman had shown the classic javascript:alert(1) payload as that which gets passed through a
new URL()
parsing just fine, but is vulnerable still. - Ori Livni demonstrated how different URL schemes are possible to provide a valid URL (per
new URL
) but are often not what the developer would have expected. - Emily had hinted that a fast performing
URL.isValid()
would be a nice idea.
Getting better at RegEx
As I have mentioned in a follow-up tweet to the discussion about JavaScript URL validation:
for most of us, unless you are @TheDavisJam who practically wrote the book on regular expression denial of service and who is familiar with internal regex state machine engines.
I’ve put together a few resources that will be helpful to better understand ReDoS, its impact and how to avoid it:
- Snyk’s ReDoS learning path
- Cloudflare made available a dedicated page about ReDoS and how to avoid it.
- Tim Kadlec’s Regular Expression Denial of Service (ReDoS) in Node.js guide about regular expression Catastrophic Backtracking.
Recommend
-
72
Successfully working with regular expressions requires you to know what each special character, flag and method does. This is a regular expressions cheat sheet which you can refer to when trying to remember how a method,...
-
24
README.md IMPORTANT: since v14 we have removed the jQuery dependency. See below for how to initialise and use the plugin with pure JavaScript. If you...
-
9
While working on an issue in Codespaces, I figured this would be a good case to implement some regex. Each time I work with regex I need to figure out how it works again, but also each time I am impressed with how powerful it is. And actually...
-
8
Web Design and Web Development news, javascript, angular, react, vue, php
-
1
How to Check Whether a String Does Not Start With a Regex in JavaScript Dec 29, 2021 To check if a string does not start with specific characters using a regular expression, use the test() funct...
-
6
Stop Validating Email Addresses With Regex September 6, 2012 Just stop, y’all. It’s a waste of your time and your effort. Put down your Google search for an email re...
-
2
javascripttip (3 Part Series) Follow me on
-
5
JavaScriptQuick Tip: Testing if a String Matches a Regex in JavaScript
-
5
Validate Emails using Regex in JavaScript Oct 27, 2022 To validate emails using a regular expression, you can use the match function with one of the two following regular expressions. The match...
-
4
This is the second article in a series of regex articles. In the first article, you can read about common use cases for regex. This artic...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK