

Stop Validating Email Addresses with Regex (2012)
source link: https://davidcel.is/posts/stop-validating-email-addresses-with-regex/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Stop Validating Email Addresses With Regex
Just stop, y’all. It’s a waste of your time and your effort. Put down your Google search for an email regular expression, take a step back, and breathe. There’s a famous quote that goes:
Some people, when confronted with a problem, think, “I know, I’ll use regular expressions.” Now they have two problems.
Here’s a fairly common code sample from Rails Applications with some sort of authentication system:
class User < ActiveRecord::Base # This regex is from https://github.com/plataformatec/devise, the most # popular Rails authentication library validates_format_of :email, :with => /\A[^@]+@([^@\.]+\.)+[^@\.]+\z/ end
If you’re experienced at Regex, this seems simple. If (like me when I first saw this) you AREN’T experienced at Regex, it takes a while to parse. But believe me, it can get way worse…
class User < ActiveRecord::Base validates_format_of :email, :with => /^(|(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6})$/i end
Or even worse still…
class User < ActiveRecord::Base validates :email, :with => EmailAddressValidator end
class EmailAddressValidator < ActiveModel::Validator EMAIL_ADDRESS_QTEXT = Regexp.new '[^\\x0d\\x22\\x5c\\x80-\\xff]', nil, 'n' EMAIL_ADDRESS_DTEXT = Regexp.new '[^\\x0d\\x5b-\\x5d\\x80-\\xff]', nil, 'n' EMAIL_ADDRESS_ATOM = Regexp.new '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+', nil, 'n' EMAIL_ADDRESS_QUOTED_PAIR = Regexp.new '\\x5c[\\x00-\\x7f]', nil, 'n' EMAIL_ADDRESS_DOMAIN_LITERAL = Regexp.new "\\x5b(?:#{EMAIL_ADDRESS_DTEXT}|#{EMAIL_ADDRESS_QUOTED_PAIR})*\\x5d", nil, 'n' EMAIL_ADDRESS_QUOTED_STRING = Regexp.new "\\x22(?:#{EMAIL_ADDRESS_QTEXT}|#{EMAIL_ADDRESS_QUOTED_PAIR})*\\x22", nil, 'n' EMAIL_ADDRESS_DOMAIN_REF = EMAIL_ADDRESS_ATOM EMAIL_ADDRESS_SUB_DOMAIN = "(?:#{EMAIL_ADDRESS_DOMAIN_REF}|#{EMAIL_ADDRESS_DOMAIN_LITERAL})" EMAIL_ADDRESS_WORD = "(?:#{EMAIL_ADDRESS_ATOM}|#{EMAIL_ADDRESS_QUOTED_STRING})" EMAIL_ADDRESS_DOMAIN = "#{EMAIL_ADDRESS_SUB_DOMAIN}(?:\\x2e#{EMAIL_ADDRESS_SUB_DOMAIN})*" EMAIL_ADDRESS_LOCAL_PART = "#{EMAIL_ADDRESS_WORD}(?:\\x2e#{EMAIL_ADDRESS_WORD})*" EMAIL_ADDRESS_SPEC = "#{EMAIL_ADDRESS_LOCAL_PART}\\x40#{EMAIL_ADDRESS_DOMAIN}" EMAIL_ADDRESS_PATTERN = Regexp.new "#{EMAIL_ADDRESS_SPEC}", nil, 'n' EMAIL_ADDRESS_EXACT_PATTERN = Regexp.new "\\A#{EMAIL_ADDRESS_SPEC}\\z", nil, 'n'
def validate(record) unless record.email =~ EMAIL_ADDRESS_EXACT_PATTERN record.errors[:email] << 'is invalid' end end end
Yeesh. Is something that complex really necessary? If you actually check the Google query I linked above, people have been writing (or trying to write) RFC-compliant regular expressions to parse email addresses for years. They can get ridiculously convoluted as in the case above and, according to the specification, are often too strict anyway.
Sections 3.2.4 and 3.4.1 of the RFC go into the requirements on how an email address needs to be formatted and, well, there’s not much you can’t do in your email address when quotes or backslashes are involved. The local string (the part of the email address that comes before the @) can contain any of these characters: ! $ & * - = ^ ` | ~ # % ' + / ? _ { }
But guess what? You can use pretty much any character you want if you escape it by surrounding it in quotes. For example, "Look at all these spaces!"@example.com
is a valid email address. Nice.
For this reason, for a time I began running any email address against the following regular expression instead:
class User < ActiveRecord::Base validates_format_of :email, :with => /@/ end
Simple, right? Email addresses have to have an @ symbol. This is often the most I do and, when paired with a confirmation field for the email address on your registration form, can alleviate most problems with user error. But what if I told you there were a way to determine whether or not an email is valid without resorting to regular expressions at all? It’s surprisingly easy, and you’re probably already doing it anyway.
Just send them an email already
No, I’m not joking. Just send your users an email. The activation email is a practice that’s been in use for years, but it’s often paired with complex validations that the email is formatted correctly. If you’re going to send an activation email to users, why bother using a gigantic regular expression?
Think about it this way: I register for your website under the email address [email protected]
. C’mon. That’s probably going to bounce off of the illustrious mail daemon, but the formatting is fine; it’s a valid email address. To fix this problem, you implement an activation system where, after registering, I am sent an email with a link I must click. This is to verify that I actually own that email address before my account is activated. At this point, why keep parsing email addresses for their format? The result of sending an email to a badly formatted email address would be the same: it’ll get bounced. If your user enters a bad email address, they won’t get the activation email and they’ll try to register again if they really care about using your site. It’s that simple.
So eschew your fancy regular expressions already. If you really want to do checking of email addresses right on the signup page, include a confirmation field so they have to type it twice. Enterprising individuals will just copy and paste, but what it comes down to is this: if your user enters a bad email address, you shouldn’t make it more of a problem for yourself than you have to. A complex regex validation on the email address doesn’t introduce an additional solution, it introduces an additional problem. If you really, really want to make sure people are typing in an actual email address, just use the /@/
regular expression and call it done. If that makes you nervous, then check for the dot too: /.+@.+\..+/i
. Anything more is overkill.
UPDATE: As several users in the comments have also pointed out, many email address regexes on the web will show tagged emails (i.e. [email protected]
) as invalid. Lots of people use tags in their email addresses while registering as a pair with their email service’s filtering systems. Keep that in mind if you don’t wish to heed the above advice.
Additionally, you could (and should) take a look at Kicksend’s mail checker to do some client-side validations in the form of typo fix suggestions.
Recommend
-
31
trualias Copyright (c) 2019 Fred Morris, Tacoma WA. Apache 2.0 license. Trualias is a postfix tcp table that lets you hand out your email address to anyone and everyone but add a bit of math to protect yo...
-
12
Optimizing Email Addresses for Remote, Mobile Workers in an Age of Spam Jun 27, 2002 Optimizing Email Addresses for Remote, Mobile Workers in an Age of Spam Last updated: 6/27/2002; 12:03:38 PM...
-
21
In this post, we look at how school email addresses can be verified easily and quickly in Node.js. This is especially useful when a service wants to give certain perks or benefits to students or teachers. Often this is done using paid...
-
4
Validating IP Addresses in PowerShell – The SQL HeraldSkip to content
-
11
How to Write Email Addresses in the Web 谢益辉 / 2007-07-13 What I hate most about the internet is the spam email, so I pay much attention to how to write my email address in the public web pages. Of course we cann...
-
6
Do you know what to use the @ sign for something other than email addresses and Twitter handles? I do!Published: 2021.08.02 | 2 minutes readIt was an ordinary Saturday when my girlfriend and myself decided to visit
-
8
Why You Should Stop Overworking Your Email Regex Front-end validation for entry fields can be a tedious task. Email address forms always seem to prompt a wall of discussion around all the use cases that should be cove...
-
8
Copy link Contributor ryanong ...
-
17
Negate characters for Regex email advertisements I am trying to write a basic regex to validate an email addres...
-
5
Oct 28, 2022 ~ 4 min read Are you also validating a JavaScript URL using RegEx? share this story on
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK