Stop Doing Email Validation the Wrong Way

How do you know you are doing it wrong?

  • Are you checking the length of the email? Is it less than 128 character? Is it less than 319 characters?
  • Is the regular expression for email validation written by you?
  • Is said regular expression from php.net or this site?
  • Does the regular expression conform to the RFC for email?
  • Do you not know what RFC stands for?
  • Do you expect that your email validation will predict a real address you can send emails to?

If you answered yes to any of these, then please stop, you are doing it wrong.

Besides a generic test case that makes sure that there is a '@' and a '.' with text around the two, I'm really not concerned with the rest. What I'm concerned with is if I can send emails to the user without them bouncing back.

Testing Length

I think it is more deciding how many users you are going to support and how many you are going to give the finger to. "I don't support edge cases," or "I don't support the 1% that will fail." Given the disadvantages of development, I'll say the more I support, the less fudge the numbers will be. Given the goal should be support for all users, not supporting possible edge cases, doesn't seem like a good idea.

How many email address will be more than 40 characters in length? More than you would think. How many would be more than 64 characters? Not a lot, but doesn't take in account Unicode.

Quote from Wikipedia Email Address, "...local-part of an e-mail address has a maximum of 64 characters ... and the domain name a maximum of 255 characters."

So 255 characters sound about right to you? Seeing as that is the maximum for MySQL VARCHAR and I doubt any domain is going to be 255 characters, unless it has a lot of subdomains or long subdomains.

Addendum ICANN limits domain names length to 63 characters. While the wiki article probably isn't wrong, since ICANN does impose the limit, it should be expected that all registrars are going to follow that rule. Therefore the max length would be around 140 to 150 to allow for future character length.

Update

The rest is for the subdomain(s), thanks JT Wenting.

Regular Expressions

Sigh, there are still some sites that won't let me use '+' in my email address. It is a valid character for the local part. There are many other characters that I should be able to use, but cannot. It is an injustice and a travesty against all that is "Request For Comments!" That said, I have yet to develop a regular expression that correct validates against the RFC.

Half assed seems to work just fine for all emails even ones with invalid characters. I think it might be easier just to check to see if the email has invalid characters than to test to see if it has valid characters.

Update

The purpose of this post is only to say that developers should be liberal with input where security compromises won't be imposed. With the case of email, to be too conservative would to block too many people (which I consider myself too many people, and mostly one person would also be too many), as is seen with too many DAMNED web sites. Too many of which I can't tell to kiss my ass.

In the case of security, okay, we are talking about email here! If they can hack your site because of inputting an email address, then there are other issues involved.

It also seems that everyone has their own email validation method. "You're probably not a developer if you haven't reinvented the wheel." It is frustrating seeing pointless email validation, which are incomplete and wrong most of the time. It is even more frustrating, that these are highly talented, intelligent programmers making these small mistakes. Ironic, that they can create a complete complex system, but fail to realize the simple things (whereas, I realize the simple things, but fail with complex systems).