How do you know you are doing it wrong?
- Are you checking the length of the email? Is it less than 128 character? Is it less than 319 characters?
- Is the regular expression for email validation written by you?
- Is said regular expression from php.net or this site?
- Does the regular expression conform to the RFC for email?
- Do you not know what RFC stands for?
- Do you expect that your email validation will predict a real address you can send emails to?
If you answered yes to any of these, then please stop, you are doing it wrong.
Besides a generic test case that makes sure that there is a ‘@’ and a ‘.’ with text around the two, I’m really not concerned with the rest. What I’m concerned with is if I can send emails to the user without them bouncing back.
Testing Length
I think it is more deciding how many users you are going to support and how many you are going to give the finger to. “I don’t support edge cases,” or “I don’t support the 1% that will fail.” Given the disadvantages of development, I’ll say the more I support, the less fudge the numbers will be. Given the goal should be support for all users, not supporting possible edge cases, doesn’t seem like a good idea.
How many email address will be more than 40 characters in length? More than you would think. How many would be more than 64 characters? Not a lot, but doesn’t take in account Unicode.
Quote from Wikipedia Email Address, “…local-part of an e-mail address has a maximum of 64 characters … and the domain name a maximum of 255 characters.”
So 255 characters sound about right to you? Seeing as that is the maximum for MySQL VARCHAR and I doubt any domain is going to be 255 characters, unless it has a lot of subdomains or long subdomains.
ICANN limits domain names length to 63 characters. While the wiki article probably isn’t wrong, since ICANN does impose the limit, it should be expected that all registrars are going to follow that rule. Therefore the max length would be around 140 to 150 to allow for future character length.
Update
The rest is for the subdomain(s), thanks JT Wenting.
Regular Expressions
Sigh, there are still some sites that won’t let me use ‘+’ in my email address. It is a valid character for the local part. There are many other characters that I should be able to use, but cannot. It is an injustice and a travesty against all that is “Request For Comments!” That said, I have yet to develop a regular expression that correct validates against the RFC.
Half assed seems to work just fine for all emails even ones with invalid characters. I think it might be easier just to check to see if the email has invalid characters than to test to see if it has valid characters.
Update
The purpose of this post is only to say that developers should be liberal with input where security compromises won’t be imposed. With the case of email, to be too conservative would to block too many people (which I consider myself too many people, and mostly one person would also be too many), as is seen with too many DAMNED web sites. Too many of which I can’t tell to kiss my ass.
In the case of security, okay, we are talking about email here! If they can hack your site because of inputting an email address, then there are other issues involved.
It also seems that everyone has their own email validation method. “You’re probably not a developer if you haven’t reinvented the wheel.” It is frustrating seeing pointless email validation, which are incomplete and wrong most of the time. It is even more frustrating, that these are highly talented, intelligent programmers making these small mistakes. Ironic, that they can create a complete complex system, but fail to realize the simple things (whereas, I realize the simple things, but fail with complex systems).
Possibly Related Posts:
- Paying Off Debt Revisited
- Thou Art God
- Saying Good Bye to Dollhouse
- Where the Wild Things Are Movie Review
- Why Choose 1st Financial Bank?
Tags: PHP
“I have yet to develop a regular expression that correct validates against the RFC.”
Yea, probably because validating to the RFC is _really hard_. Luckily, Cal Henderson (lead developer of Flickr) has done the hard stuff for us already… Check out my article on regular expressions for web developers or go straight to the source.
Feel free to have a look at our email validation function for Phorum. It includes a complete (afaik) regex and MX checking.
http://www.phorum.org/development/browser/phorum5/trunk/include/email_functions.php
Its been tested for years now in Phorum and on dealnews.com.
Phorum has a BSD style license.
Testing the length of an email for a sane value has nothing todo with the RFC.
It is a defense in depth technique for secure websites. Such long emails are less than 1%. Actually I really much doubt that any real person uses such a long email.
When a very long email is entered is far more likely to be an attack(robot) than a real person.
And btw:… PHP on windows had internal bufferoverflows (not long ago) in mail() that were triggered by long emails.
Check this http://alertgear.com It allows organize you some kind of mailing list. But instead of email you can send news to desktop.
[...] skoop via santosj.name Submitted: Jul 12 / 02:21 Stop Doing Email Validation the Wrong Way How do you know you are doing it wrong? Vote up Vote [...]
Jacob: you really should cite an RFC number rather than leave it at the mysterious “RFC for email”. For example, the address format specified by RFC 822 is utterly absurd: if a user attempts to use an address with embedded newlines and comments (!), the correct response isn’t to let him in; it’s to punch him in the face. (Difficult over the Internet, true, but still.)
I don’t want the kind of user who’ll include an “@” in a quoted-string segment and then complain if my regex rejects it…
The PEAR::Mail_RFC822 class validates email addresses correctly. I also has a real world method for checking the “common” email address format.
[...] Address Validation and Mission Creep +1 to what ndg said in response to Jacob’s post about e-mail address validation. I think Jacob sort of touches on this [...]
Another bummer on email validation practice will eventually be supporting Internationalized Domain Names. Yes, support for IDNs is still sketchy (non-existent really for most people) but it’s only a matter of time. I wonder how many folk will harvest valid email addresses from homographed domains though
.
All these validations won’t keep “johndoe@hotmail.com” from typing “jojndoe@hitmail.com”, so what’s the use ?
I just check that there’s a “@” and a “.”, and some characters before, inbetween and after, and I think that’s ok.
Just use patForms – it comes with a rule to validate email addresses.
BTW, if it does something wrong, send a patch
Testing for length against a bot doesn’t seem like a good idea. The evolution of bots would probably consider the name of the field and follow suit with a shorter and more accurate email address. The script in question does have many checks for protection against spam bots. However, I think using the nonce approach, while still flawed, could better protect against such inane checks.
The second problem, is that the audience would find all of the (redundant) questions difficult. The easier the form, the more responses would be received.
Coincidently, my nonce implementation also prevents against refreshing on the submit form, since once the nonce is deleted, it kicks them out. However, it isn’t perfect and it could easily be bypassed. My focus isn’t keeping 100% of bots out, just enough that it wouldn’t put strains on the workforce involved.
I think nonce implementations are far more easier than implementing an accurate email validation scheme. It is probably also easier to check for invalid characters than valid emails.
“ICANN limits domain names length to 63 characters. While the wiki article probably isn’t wrong,”
Ever heard of subdomains?
ICAN may limit the length of the core domain name to 63 characters, but that leaves nearly 200 for the subdomain.
Add the 64 wikipedia lists for the account name, and a single one for the @ sign, and that makes for a pretty long email address.
Of course even then you’re not home free. You’re assuming that the system will be used only for email exchange using the currently “normal” protocols of SMTP and POP3 in use on the internet, but this doesn’t have to be.
It the system is used on other networks or using other mail protocols all bets are off.
How odd. While discussing the matter, I did bring up subdomains, but forgot about it. Interesting how the memory works. I was explaining why I don’t place the closing PHP ‘?>’ with one example. Then two weeks later, I remembered another reason for not having it.
Thanks JT.
You just further highlight the complexity of emails, however that is outside the scope of the discussion. It is fascinating and I will have to remember to do further research in a couple of years or if I ever have a research project on such a matter.
[...] # email validation is HARD [...]
[...] Sklar: E-Mail Address Validation and Mission Creep Original von David Sklar (RSS-Feed) +1 to what ndg said in response to Jacob’s post about e-mail address validation. I think Jacob sort of touches on this [...]
Validating email address existence is not the same thing as validating email address format, and both have legitimate but different purposes.
Sometimes it is necessary to insure a email address that does not yet exist is valid. But since the address does not exist, sending a test email will obviously not work.
http://SimonSlick.com/VEAF/ValidateEmailAddressFormat.html
BTW, jon(some comment)doe@domain.tld is not an email address. It is an email address with an embedded comment. The email address is jondoe@domain.tld. So forgive me if I do not want and do not accept your comments in your email address.
And besides RFC 822 has largely be replaced by RFC2822. I wouldn’t be so sure that embedded comments are even RFC compliant anymore. (just because some email systems still permit them does not necessarily mean they are RFC compliant).
[...] Jacob Santos’s “Stop Doing Email Validation the Wrong Way†rant. [...]
Testing the length of an email for a sane value has nothing todo with the RFC.
E-mail validation is really a problem and "old" systems as sending a link to new users have very bad results… We are studying solutions and will keep you posted if we find some