Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yet another article that can't tell the difference between a "falsehood" and a heuristic.

The end goal of most software is to weed out bogus email addresses, not to weed out email addresses that don't match the standard. "a@a.com" is a valid email address according to RFC 5322. But when a user provides such an address, you can be 99.9999% certain that it is neither the user's address, nor anyone else's.

Many programmers are very much aware that "mymail@[123.123.123.123]" is technically a valid email, but allowing such addresses invariably leads to spam and service abuse, for virtually no benefit. Restricting accepted addresses to "normal" ones is common sense, not a falsehood. The same is true for many of the other supposed mistakes pointed out in the article.



Restricting accepted addresses to "normal" ones is common sense, not a falsehood.

In the case of a@a.com, that's just someone not wanting to give an email address. You can block it with validation rules, but they'll just use some random but not real address like noemailforyou@gmail.com instead. The validation achieves nothing except annoying the user, really annoying anyone whose legitimate address is blocked as a false positive, and makes your email address database harder to clean up if you ever want to send an email to everyone.

Blockimg emails because they're "not normal" is validation theatre. It doesn't stop anyone nefarious or who wants privacy, it doesn't stop spammers, and it does stop rare cases of people with weird email accounts.


The assumption that you can get away with a heuristic would be a falsehood.

Attempting to block any email will always be erroneous and pointless - heck, in this day and age, use of gmail.com or outlook.com is a bigger cause for suspicion than a "weird" address, as the big providers are usually the ones used for malicious activities in order to blend in. By trying to be "smart" with a heuristic, all you're doing is exclude a lot of real users.

If you want to guard against "a@a.com", the only sensible and bullet-proof solution is a simple email validation flow as indeed the only thing you can assume about a valid address is that the user should be able to read email sent to that address.

If worried about generating spam from such flow from malicious users, implement rate limits per target address and source IP.

Users also expect email validation at this point and will most likely provide a valid, routable email if they didn't make a typo. a@a.com rarely flies nowadays.


> The assumption that you can get away with a heuristic would be a falsehood.

That depends on what you mean by "get away". Plenty of large-scale software systems implement some of the "falsehoods" mentioned in the article. Those systems still operate, so they seem to "get away" with it just fine.

The worldview underlying articles like this one, which I believe is itself a mistake, is that software must be able to accommodate 100% of cases encountered in the real world. But no system, past or present, does that. In the end, it's always users that end up adapting to the system instead. There are countless examples for this, such as people who have no last name filling in their name as both first and last names so they can apply for a passport which requires those fields to be filled.

When someone doesn't fill in the "last name" field on a passport application form, it's much more likely that they overlooked the field than that they actually don't have a last name. When someone provides "mymail@[123.123.123.123]" in an online signup form, it's much more likely that they are trying to do something fishy than this actually being their email address. And that's reason enough to reject such emails outright, without even bothering with the usual validation flow.


> The worldview underlying articles like this one, which I believe is itself a mistake, is that software must be able to accommodate 100% of cases encountered in the real world.

The mistake I believe you are making is removing users from the set you accommodate with absolutely no valid reason or gain.

Sure, there can be valid reasons to discriminate against users, but you better have a valid reason with no non-descriminating option available.

When it comes to email, not only is the discriminating approach completely broken, the non-discriminating approach is free of any problems, and will likely be implemented anyway!

> When someone provides "mymail@[123.123.123.123]" in an online signup form, it's much more likely that they are trying to do something fishy than this actually being their email address.

So people with addresses you do not consider normal must be doing fishy things?

In 2022, fishy things are done with gmail.com addresses as they go unnoticed, are easy to issue and are universally accepted and deliverable. People don't use addresses that raise eyebrows when they're trying to go unnoticed.

> When someone doesn't fill in the "last name" field on a passport application form, it's much more likely that they overlooked the field than that they actually don't have a last name.

Not a valid argument as we can test emails trivially but not test names without very specific registry accesses.

... But this is also a perfect example of an incorrect heuristic. Hopefully you do not consider it fair to exclude all members of societies that do not use last names as fair? That's far more grotesque than your email example after all, and fairly illegal under most anti-discrimination laws out there.

Instead, add a checkbox for "I do not have a last name" if you think that failing to enter your last name is a common enough error to bother those without.


> The mistake I believe you are making is removing users from the set you accommodate with absolutely no valid reason or gain.

That's a very strange objection. Virtually every online service excludes all customers that don't have a credit card. Many exclude all users that don't have a mobile phone or are unwilling to provide their phone number, even where the service has nothing to do with phones. Simplifying/excluding assumptions about users are ubiquitous on the web (and in society in general) today.

> Hopefully you do not consider it fair to exclude all members of societies that do not use last names as fair?

They're not being excluded, they just have to go through some extra steps, which is already true for many, many people for a near-infinite number of reasons. Those extra steps might involve workarounds like the one I mentioned.

> Instead, add a checkbox for "I do not have a last name"

This isn't feasible because there are hundreds of special cases like that. Forms (and software systems) would balloon in complexity and become utterly unmanageable in the "99% case" if every single possibility was catered to.

Just some examples: There are people who don't have a name at all (yes!). There are people who don't know their date of birth (fairly common actually). Should standard forms have checkboxes for all of these cases?

The map is not the territory. Expecting databases to perfectly model reality is an exercise in futility. It's far better in most cases to make the data fit the model (say, by filling in default or approximate values where the true value isn't known or available) than to relax constraints to the point where they become meaningless just because there is the odd entry that doesn't fit in, while the overwhelming majority of entries do.


> That's a very strange objection. Virtually every online service excludes all customers that don't have a credit card.

But they do not discriminate against who issued it. Valid? All good. Just like it with email.

> They're not being excluded, they just have to go through some extra steps

Disallowing an empty last name field is exclusion, not extra steps. Asking if it's correct is extra steps, which might be fair.

Disallowing what in your opinion is a "weird" email is exclusion, not extra steps. Allowing me to use it after sending me an email would be extra steps.

However, the name situation is just a validity or database issue, the email is unwarranted discrimination as all emails have the same format.

And that is exactly why the argument is not applicable: It is one thing to have a system not fit due to too an ill defined legal form or too many possible options (which is unacceptable on its own, but is hard to resolve), it is another entirely to decide to discriminate actively without cause, especially as it is extra work.

Your first argument was that an invalid email was a typo, which the validation flow sorts out which is needed anyway as a gmail address is no less likely to habe a typo. The latter was that weird emails are likely fraudulent, which is just flat out false.

So with those out of the way, what reasons remain for going out of your way and writing additional code for deciding what email address is right and what is wrong?


> But they do not discriminate against who issued it. Valid? All good. Just like it with email.

Ahaha of course they do! Try paying online with a credit card issued by a bank from Uganda, or by a bank that allows customers to prepay their credit card bills, or with a "privacy.com" type virtual card, and you will very quickly see that payment processors most certainly do not treat all cards the same, even if they are valid.

And yes, that's "just like with email". Because the majority of online services will outright reject addresses from anonymous/"temp mail" providers, even though they are perfectly valid and can receive mail. Many will additionally disallow +suffix addresses, because they can be used to create multiple accounts from the same mailbox, and also make it easier to automatically get rid of spam originating from the service provider. Service providers want the user's primary, personal email address. Exclusion rules exist to increase the likelihood that this address is in fact what the user has entered. The standard validation flow is clearly not sufficient, else service providers wouldn't bother with all the additional complexity.


> There are people who don't have a name at all

Wow, can you elaborate on this, or point me to a reference to learn more? People may not have a formal name recorded officially, but I find it *really* hard to believe they wouldn't have a name at all.


See e.g. [1]:

"Most Machiguenga do not have personal names. Members of the same band are identified by kin terminology, while members of a different band or tribe are referred to by their Spanish names."

There are also plenty of cultures where a person has multiple distinct names whose use depends on the context, cultures where one's "true name" is considered a secret that must be kept from others, and cultures where people routinely change their own name multiple times during their life just because they have decided they would now like to be called something else.

[1] https://en.wikipedia.org/wiki/Machiguenga


Facebook used to reject my email address, because the local part was "email", i.e., "email@my-domain.tld".

I was not amused.


A counter argument on this is also to hold a standard across your software suite. The amount of times i have found websites that allow `email+tag@example.com` on sign up but then promptly break at the backend has become more than i can count on one hand.

While the argument that supporting all valid email addressing may not be the best idea, holding a uniform standard across your own systems is.


Agreed. And I don't really care if your email client doesn't understand MIME, as I don't care if you get it via UUCP, because you are not that special* and not worth the extra time. You can go yell at me or whatever, I can't justify the expense.

At this point, I also don't really care if your client can't read HTML MIME, again upgrade it or don't get the message.

* assuming I am not sending it to you specifically, in which case you are a friend and you are that important.


You're right.

Though there might be an use case where those 'weirder' emails get accepted. Correct, for your average "sign up here" website, no.

But for example, where the receiving side is an automated mailbox, you might want to be more careful accepting 'weird' emails


In respect of handling email addresses provided by customers (note I do not use what I consider has become almost a derogatory term "users") the U.K. Government Digital Service (GDS) guidelines on interface/experience design and implementation patterns is widely seen as the gold standard[3]. Here's what they have to say about accepting email addresses[0] including code examples:

    When asking users for their email address, you must:

    make it clear why you’re asking
    make sure the field works for all of your users
    help users to enter a valid email address

    You may also need to check that users have access to the email account they give you.

and here [1] is the github issue tracker for discussing email address patterns accepted by government services.

In respect of people without both first and last names, I resemble those remarks! I've dealt with broken computer systems since the 1990s that assume first+last (or even first+middle+last !), or set an arbitrary minimum length, and break when meeting me ("Tj") !

Worst cases for this are web services that set arbitrary rules for the name on a credit card (which can be almost anything by the rules and guidelines and should be free-form) - I had this only yesterday with onlyfans.com not accepting my name as it appears on the card because their rules impose first " " last format.

The double-abuse of the customer then comes when the first-line support, when told precisely what the problem is, and "please let your web-devs know", gets the response "Use a different card". Turned out that onlyfans.com (or their provider) don't even ensure the name matches when actually doing the authorisation since I put something random in for first-name and it was authorised.

I've discussed this in detail with my various bank's technical teams over the years and they've confirmed it usually isn't their side doing a DECLINE; it's the requesting service applying overly-strict arbitrary rules before deciding to make the authorisation request.

Again, GDS has recommendations (and consider this is for government services) for accepting names in free-form, not split into fields[2]. and says this:

    Use single or multiple fields depending on your user’s needs. Not everyone’s name fits the first-name, last-name format. Using multiple name fields mean there’s more risk that a person’s name will not fit the format you’ve chosen and that it is entered incorrectly.
In my case in the U.K. the passport office telephoned me once, for my first digital passport in the 1990s, asking rather apologetically if I'd mind them putting X's in the first name since their computer system couldn't cope with it being empty. Whereas the U.K. Driver and Vehicle Licencing Agency (DVLA) has no problem at all and shows my legal name correctly on my driving licence.

Two annoying exceptions (not strictly government created/operated) are the internal (local) NHS registration system and some local authority (local council) electoral role (voter registration) systems that do it badly - usually due to having bought in an external 'enterprise' application to handle it, or trying to interface many disparate systems.

Generally, over the last 25+ years, I've found government organisations are great at handling these corner cases but random private sector / out-sourced development is worst.

Getting traction to get things fixed is the hardest part - being treated as dumb (usually by first-line support and their 'managers') when I set out a clear case and rationale for the bug and how to fix it has to be amongst my least favourite voluntary community-spirited endeavours. The short-cut I apply there, now, is a an email CC-ed to the organisation head (chair, CEO) and senior legal person.

There is an up-side to it though - I rarely if ever suffer any kind of spam or phishing and anyone trying identity theft will have to have much more determination than me to overcome all the obstacles :P

[0] https://design-system.service.gov.uk/patterns/email-addresse...

[1] https://github.com/alphagov/govuk-design-system-backlog/issu...

[2] https://design-system.service.gov.uk/patterns/names/

[3] https://design-system.service.gov.uk/patterns/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: