Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which makes the problem even worse. Are you supposed to have comparison rules per provider then? Only for those that are large enough to bother in your particular use case?


Yes, if you want to know if two addresses refer to the same account, you need to keep up with the actual rules for each provider that you decide to care about.

But, if you're using the email as a username just for convenience, you get to decide what rules you have for username comparison. You can absolutely decide that A@gmail.com and a@gmail.com are different accounts for your service, even though emails you send to these accounts will likely reach the same inbox. You can also decide that they are the same account, but that haşim@gmail.com, HAŞIM@gmail.com and HAŞİM@gmail.com are all different accounts. It's entirely up to you to know what is important for your users.


In this case, a@googlemail.com is also the same as a@gmail.com


It just means you cannot use email validation as a way to limit one account per person. A person could just have another email on another provider anyway.

It's part of the reason why you see many places ask for your phone number, not that I agree with that, and even this method has many flaws.


My favorite issue with phone number-based identity verification is that it doesn't account for multiple different countries well at all. In some regions, getting more than one 'legit' phone number (i.e. not on any of the VoIP blocklists) per person is nearly impossible, while in other regions, even in the Western world, it's very much possible to buy a lot of pay-as-you-go plans, keep the numbers from expiring, and use them to retain perhaps dozens of accounts at no to minimal cost, making it yet another measure that affects legitimate users disproportionately compared to it affecting illegitimate users.


I really wish there was something that was easy to get as a company but hard to get as an individual. As a B2B SaaS, I want to do the organization-level equivalent of KYC — "Know Your Corporate Buyer"? — but there's no such thing.

Instead, all anyone offers in this vein are identifiers that are really annoying to get and that nobody already has as a matter-of-course of registering a company; such that many real companies can't (or won't bother to) pass them. DUNS numbers are a common choice. Our company is five years old and doesn't have a DUNS number. It takes two weeks to get one. So how could I expect our customers to bother getting one just to try our product?


"can you set this value in a TXT field on your DNS" or another domain-level challenge-authentication mechanism would be one. Sure, anyone can buy a domain but in theory that is what Extended Verification is supposed to represent, so if you get a valid handshake from an EV domain then in theory you're talking with someone representing the organization that the EV certificate was issued for.

Of course in principle you can get an EV cert for any company as well, but now we're talking about determining whether a company is sufficiently well-known to accept, which is not something that can fully be solved deterministically since that's a human judgement, there will always be some grey areas. But, there are certainly a lot of companies that could probably be "automatically verified" in some fashion (ford motor company? reasonably well-known, and this is their domain...) given some sort of authoritative domain -> stock mapping, and that's not impossible to do if you trust the EV scheme.

Anyway, not perfect, but challenges based raises the bar a lot from "register domain with godaddy and sign up for let's encrypt" to "first you have to get an EV certificate..."

Of course, since such a mechanism is not widely used... not exactly going to find tons of official support for using it like that. But the mechanisms are there!


Isn't a company itself a thing that is easy to get as an individual, in certain jurisdictions?


Yeah, but not usually easy enough that you can make hundreds/thousands of them quickly for a single purpose and then throw them away.

Also, as a slight tweak, I'd hope that this authentication scheme would only admit legal companies that are at least a week old. People who do these sorts of bulk-registration attacks tend not to be patient people, willing to wait around for their credentials to gain reputation before using them. They create them and then try to use them right away. Whereas no real company would be signing up for most B2B services in its first week of legal existence. (Google Workspace? Sure. Accounting software? Probably not.)


Ultimately this depends on what the attack is.

If it's targeted to X company for a specific reason, they'll wait however long they need. If it's generic and X company is just caught in the crossfire, then maybe.

>Whereas no real company would be signing up for most B2B services in its first week of legal existence. (Google Workspace? Sure. Accounting software? Probably not.)

When I've registered companies in the past, it's when I have sales. Before then, it's just an idea to see if we get users. But the company with the service can determine that.


Corporate Number/Tax ID?


You can, however, use email parsing + heuristics as a way to detect people who are trying to register accounts that look like they ever could be part of a set of "related" addresses — and just reject even the first member in that set. (I.e. just reject registrations with a + in the username part in the first place.)

(You don't have to be petty about it; no need to send them to a "you're an attacker" page or anything. Just redline the form-field, explain the problem clearly, and let them modify the address until it's less duplicable.)

Yes, this doesn't stop people from manually registering multiple times, since, as you say, people can have multiple addresses; or even email service from multiple providers. But it does stop low-effort automated bulk-registration attacks. And some services — those with any sort of free-tier especially — get a lot of those!


That + sign as a label to the same inbox is a Gmail thing.

It's a 100% valid character to use. Doesn't have to mean it's a label in another hostname. Goes back to what others have said. You either end up with a ton of rules, or go with the fact that email isn't the best solution to identify unique users.

https://en.wikipedia.org/wiki/Email_address#Local-part

> The local-part of the email address may be unquoted or may be enclosed in quotation marks.

>If unquoted, it may use any of these ASCII characters:

> - uppercase and lowercase Latin letters A to Z and a to z

> - digits 0 to 9

> - printable characters !#$%&'*+-/=?^_`{|}~

> - dot ., provided that it is not the first or last character and provided also that it does not appear consecutively (e.g., John..Doe@example.com is not allowed).[5]

> If quoted, it may contain Space, Horizontal Tab (HT), any ASCII graphic except Backslash and Quote and a quoted-pair consisting of a Backslash followed by HT, Space or any ASCII graphic; it may also be split between lines anywhere that HT or Space appears. In contrast to unquoted local-parts, the addresses ".John.Doe"@example.com, "John.Doe."@example.com and "John..Doe"@example.com are allowed.

(it keeps going on the wikipedia page)


I know that it's allowed; but — given that we're a B2B company serving highly-technically-literate customers — I don't think I want the business of anyone who thinks it'd be a good idea to use it even when it is allowed, given how it'd affect their own deliverability to imperfectly-implemented MTAs.

(It's a similar "you're really relying on perfect competence from the whole rest of the ecosystem to get you out of this" feeling as e.g. putting a space in your user name — and thus your home directory path — in Linux. Sure, the core GNU/BSD/etc tooling has been tested for that use-case — but are you really going to trust random tools and shell-scripts to handle argument tokenization perfectly? Or are you just going to ditch the space to be safe?)


>serving highly-technically-literate customers

It would probably be the other way.

The more technical literate a user is, the more they'll understand and figure out how things work, the nuances, and use those things.

You know that core tooling that works? It's because technical literate users, tried it, found the issue, and fixed it.

Same way with me using é on my name. If something breaks on a website, email, etc, I create an issue and starts emailing. My name doesn't have an e, it has an é.


you are supposed to not use email as a token of user uniqueness, because it's not.

the fundamental problem here is assuming that "emailA == emailB" could ever conceivably be used as a proxy of the equality "userA == userB". Unicode tarpits aside (that's a legitimate problem where even if you did everything exactly right, libraries or api calls etc might not) the idea of "do these mailboxes belong to the same user" (in the sense of stripping '+' suffixes from gmail boxes etc) cannot be answered period. You cannot affirmatively confirm this nor can you disprove the assertion either, without some additional information about the behavior of the receiving mail transport agent.

even if you understand the behavior of the receiving MTA, you cannot prove that they don't have an email account somewhere else, so that is not sufficient either.

email is not a key for user uniqueness, period. It's a channel for contacting a user. You don't mess with it, and you simply send an email and see if the session user can authenticate it. If you want identity verification that happens at another layer, not email.

"transforms" like lowercasing or mailbox-suffix-stripping are only window dressing patching around this fundamental issue. There is no way to affirmatively go from "email address" to "user identity" without more information from another layer. It's a communication channel, not an identity provider. You can (reasonably) use it as a medium to talk to the user while you verify them against something else (although like SMS it can of course be hijacked unless your IDP protocol accounts for this possibility) but "email verification" is not and can never be an IDP in itself. Gmail addresses are free.


> (in the sense of stripping '+' suffixes from gmail boxes etc)

I have a few accounts. Those + suffixes drive a bunch of rules on them to get me to take action. They forward to my primary email address if it's important.

Some of them get scraped and mailed in a weekly summary. Some of them send text messages alerting me.

When they mess with them, I'll probably just not see it unless I have to login for some other reason to that account.


I don't think I know a single person, tech-savvy or not, that only has one email address.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: