Case insensitivity was a huge mistake in computing really. Most languages don't ...

Beltalowda · on Sept 1, 2022

So if my email is HeLLo@example.com because I want to be cute people will have to try 6 times before they finally get the right email address? Imagine telling that to someone in person. This kind "weird" casing isn't that rare and doesn't require cute usernames: "DonaldDuck@example.com", "FreeBSD@example.com", "DrMcCoy@example.com", etc.

Languages that don't have case is not an issue; the situations where a lowercase <-> uppercase mapping is not simple are actually not that many. It's not trivial, but not all that complex either. The most annoying part is Turkish, Azeri, and Lithuanian where the rules differ a bit but the used language is often unknown. For the purpose of matching things ("is this email address known in our system?") it's actually not that hard, since you can just treat several characters as identical (displaying text correctly to users is harder, but that's not important here).

I see this attitude in various situations, often under "falsehoods programmers believe" articles, which goes something like "it's hard in a few rare cases, therefore we should not do it at all for the >99% cases where it's simple and unproblematic".

leni536 · on Sept 1, 2022

It is fine to have multiple email addresses connected to a single inbox. Email providers already do normalization like this that is not baked into the spec. Gmail for example treats johndoe@gmail.com and john.doe@gmail.com the same.

867-5309 · on Sept 1, 2022

that cannot be true. I think you meant john+doe

edit: wow. goodbye gmail

"if your email is johnsmith@gmail.com, you own all dotted versions of your address:

john.smith@gmail.com jo.hn.sm.ith@gmail.com j.o.h.n.s.m.i.t.h@gmail.com"

>johnsmith@gmail.com and j.o.h.n.s.m.i.t.h@gmail.com are the same address and go to one inbox

https://support.google.com/mail/answer/7436150

matsemann · on Sept 1, 2022

No, it's true. Can place a dot anywhere and it arrives in the same inbox.

l181 · on Sept 1, 2022

Oh but that is true.. :)

https://support.google.com/mail/answer/7436150

vetinari · on Sept 1, 2022

Well, maybe Google should check their own implementation, because interesting things happen when accounts for both version exist.

I have an account with the dot, that was made in age, when Gmail was invite-only. Few years ago, someone created an account without the dot. Yes, I'm receiving their mail, and have no way to contact them, because everything I send out comes back to me.

matsemann · on Sept 1, 2022

Most likely no account without the dot actually exists, it's just someone having written their email wrong at some places.

vetinari · on Sept 1, 2022

Then that guy puts the wrong address consistently in places, where you want to receive the mail, from applications to education courses to paypal.

astura · on Sept 1, 2022

Not only do people do this, it's actually extremely common, ask anyone with commonname@gmail.com. Someone here said their original email address has become unusable because it gets thousands of messages a day that are intended for other people. "That guy" might actually be multiple people all making the same mistake with your email address.

My coworker told me there's a complete stranger who, every time she emails her son, accidentally sends the email to him first. This has been going on for years, she makes the same mistake every time, she doesn't learn her son's actual email address, and she doesn't learn to press "reply."

I'm 100% sure the "non-dot-version" doesn't exist as a separate account.

InitialLastName · on Sept 1, 2022

AOL used to allow users to email other users without using the @aol.com extension. Back in those days I (prior to any capacity to negotiate a sensible ISP, for the record) had an email that matched a common subject line that was inundated by people who typed their subject in the To line and then wrote an email.

teddyh · on Sept 1, 2022

Yes, people do that.

Aethylia · on Sept 1, 2022

It is true and it's not just gmail, dots before the @ are ignored.

moonchild · on Sept 1, 2022

It may be that some other specific email servers implement similar behaviour to gmail, but that is not true as a general rule.

znpy · on Sept 1, 2022

And that’s why I regularly receive other people’s mail in my gmail inbox, and why i have stopped using gmail for anything important (it’s right to assume that gmail is also sending my emails to other people).

Google’s gmail people aren’t really as smart as they think they are.

scambier · on Sept 1, 2022

That's something I don't understand. I've always given my email as john.doe@gmail.com, and I sometimes receive emails - addressed to Another John Doe - sent to johndoe@gmail.com.

That Another John Doe never, ever had access to johndoe@gmail.com, they just gave a wrong address. That's not gmail's fault.

alistairSH · on Sept 1, 2022

That Another John Doe never, ever had access to johndoe@gmail.com, they just gave a wrong address. That's not gmail's fault.

This. My wife and I have two flavors of this.

Her address is firstmlast@gmail.com. There people frequently forget the m initial and somebody else owns firstlast@gmail.com She's since started using dots first.m.last to mitigate the error.

My address is firstlast@gmail, where first and last are not globally common, but are fairly common in Scotland. Once a year or so, I receive email for somebody else that shares my name. I don' know his real email, but I've been "invited" on his family vacations 3-4 times now. Infrequent enough that I just respond "thanks for the invite, but I think you'll be disappointed when I arrive and not the Alistair you were expecting."

rovr138 · on Sept 1, 2022

I have a friend that has firstmlast@ and she's friends with firstlast@ because of how common the issue is.

It's actually not a super common first and last, so firstlast@ knows when to who to forward.

ghaff · on Sept 1, 2022

Or someone assumed (or just tried) that other email address.

I signed up for my university's email forwarding for alumni early on and got my first name as my email. For quite a while, I would get emails, including fairly sensitive ones, sent to me by not yet very email savvy people just assuming you could send an email to someone's first name and it would get to them.

derefr · on Sept 1, 2022

Nah, it happens with mangled names that no bot would ever try to stuff, too. E.g. I own derefr@gmail; but I sometimes receive email from people trying to reach a man named "Derek" — who almost certainly owns the address derek.fr@gmail, but probably typoed it once as dere.fr@, and now his browser autocompletes that into registration forms for him.

scambier · on Sept 2, 2022

> Or someone assumed (or just tried) that other email address.

I received 2-3 personal emails, but most of them are automated invoices.

Aethylia · on Sept 1, 2022

This doesn't really make any sense. It's not just gmail that does this, dots are almost always ignored before the @.

Nobody else can register an email that is the same as yours but without a dot. So the only way you receive someone else's email is if they give the wrong address.

denton-scratch · on Sept 1, 2022

> It's not just gmail that does this, dots are almost always ignored before the @.

That's not my experience. Which non-gmail email software ignores dots before the @?

Thinking about this, I guess the sending MTA doesn't care about dots; it goes RCPT TO: <address.with.dots@example.com>. The receiving MTA then has to validate that address; it does that using some account database that isn't typically part of the MTA - it could be a unix account (no dots!), a database table, or an LDAP user. Finally it passes the mail off to a delivery agent, which hopefully relies on the same account database.

So the elision of dots appears to be a feature of certain account databases. So which account databases elide dots?

account42 · on Sept 1, 2022

MTAs can be configured to additional transforms before looking up the account. For example, postfix's virtual table [0] can be used for this and on my server it does elide dots in the local part (along with everything else).

[0] https://www.postfix.org/ADDRESS_REWRITING_README.html#virtua...

znpy · on Sept 1, 2022

Dots are never ignored before the @, and also aren’t ignored after it, for that matter.

I guess this is another falsehood people believe about emails.

> Nobody else can register an email that is the same as yours but without a dot.

It used to be possible, then google decided to stop allowing that (guess why?)

And by the way, that’s an arbitrary decision.

I have run mail servers and it’s just and cam tell you… it’s an arbitrary decision.

jstanley · on Sept 1, 2022

> So if my email is HeLLo@example.com because I want to be cute people will have to try 6 times before they finally get the right email address? Imagine telling that to someone in person.

Yeah. That's fine.

If my email is LLLLLLLLLLLL@example.com because I want to be cute I have to tell people to type exactly the right number of L's. Do you think they should just be able to type a lot of L's and as long as it's somewhere near it counts as the same email address?

In a world where email addresses are always case sensitive, everyone will use lowercase (like they pretty much always already do anyway), and it'll be fine.

ctxc · on Sept 1, 2022

"everyone will use lowercase (like they pretty much always already do anyway)"

This in itself sounds is a falsehood!

Beltalowda · on Sept 1, 2022

"LLL@" doesn't map to "LLLLLLLL@" in any logical way. "lll@" does; that's just a silly argument.

jstanley · on Sept 1, 2022

The only reason "LLL" and "lll" mean the same thing are because currently email addresses are (sometimes) case insensitive.

In a world where email addresses were "obviously" case sensitive, "LLL" mapping to "lll" would be just as crazy as "LLLLLLLLLL" mapping to "LLLLLLLLLLL".

They just seem similar, to humans. But they're different strings.

TylerE · on Sept 1, 2022

It also maps to 111@ and III@, depending in font

Aethylia · on Sept 1, 2022

By maps to, they mean it counts as the same email address. 111@ and lll@ do not do that. The font has no impact on the email spec. However it can add extra confusion.

tsimionescu · on Sept 1, 2022

I doubt anyone is crazy enough to implement the email spec for comparing emails, to be fair. I would honestly be surprised if any publicly available mail agent or server supports that craziness.

gumby · on Sept 1, 2022

> Case insensitivity was a huge mistake in computing really

Ah, youth. There was little choice! Sometimes you had only six bits for a character; sometimes your bytes could be from 1-36 bits wide, depending on what you wanted for your program, so you might have systems that mixed six-bit (only upper case) and early ascii (two cases) and so for matching you had to be case insensitive.

It’s easy to look back and say “those people were so stupid” but they weren’t.

OccamsMirror · on Sept 1, 2022

That's a surefire way to increase customer support load, as users have mix case for their emails all the time. They might sign up on a phone, or login on a phone. They might just be hamfisted. Sure, it might be their problem, but they'll make it yours.

kalleboo · on Sept 1, 2022

And people forgetting caps lock down for their passwords is so common that many UIs actually show a little caps lock icon in there to warn you

vel0city · on Sept 1, 2022

Some places even support the same password with inverted caps. So say, if your password was "passWORD", then "passWORD" and "PASSword" would work. If the first hash fails, they'll invert the case, re-hash, then check again.

fyvhbhn · on Sept 1, 2022

*accidentally enable caps lock

croes · on Sept 1, 2022

Computer have to adapt to people and not the other way around.

E-Mail wouldn't have been widely accepted with case sensitive addresses.

People expect that MyName@mail.com reaches the same person like myname@mail.com or Myname@Mail.COM just like letters reach their recipient irrespective of the upper and lower case of the recipient's name.

Computers are there to make the lives of the users easier, not the programmers.

Aethylia · on Sept 1, 2022

I'm not sure this is true. In my experience a lot of people actually do think that it's case sensitive. Many times I've heard someone describe the capitals while verbally telling someone their addresss.

However it being insensitive has probably helped a lot of times where people make mistakes in explaining or copying those capitals.

croes · on Sept 1, 2022

Regarding email it doesn't hurt to think they are case sensitive but opposite would be a massive problem

Gigachad · on Sept 1, 2022

As a sender, you should always treat email as case sensitive. As an email host/receiver, you can and probably should chose to be insensitive. But never assume any other host works like that.

Similar to how gmail ignores . in emails but other hosts do not.

tsimionescu · on Sept 1, 2022

> In my experience a lot of people actually do think that it's case sensitive.

I don't think they really do. They may think case matters somehow, and so may be careful to reproduce the exact case that they used before, but I don't think many people would expect JohnDoe@gmail.com and johnDoe@gmail.com to be too different email accounts.

ghaff · on Sept 1, 2022

In the general population, how many people do you think understand that username (including an email address) probably isn't case sensitive but that password almost certainly is?

zajio1am · on Sept 1, 2022

People accepted telephone that used just numbers, they would accept case-sensitive e-mail as well. Everyone would just use small caps.

fsckboy · on Sept 1, 2022

> Case insensitivity was a huge mistake in computing really

original ASCII only had uppercase. When lowercase got added, gradually with newer systems and software, without case insensitivity you would have had massive incompatibilities which probably would have hampered or even arrested the introduction of lowercase in new systems

and all over again, when microcomputers first came out, they came out with uppercase only to cut complexity and cost on simple systems.

jeffbee · on Sept 1, 2022

I don’t see how adding lowercase to ASCII could have resulted in massive backwards compatibility issues considering that uppercase-only ASCII existed for only a few weeks in 1963. Surely there was not widespread adoption of ASCII in the spring of 1963.

fsckboy · on Sept 1, 2022

I'm not old enough to be an expert, just old enough to have used the leftovers (new computers were too expensive!): many many devices were uppercase only, card punches, ASR-33 teletype, the Telex system, lineprinters, "glass" teletypes, FORTRAN, COBOL, and now that I think of it, Morse code/telegraph had always been. There was a ton of infrastructure that was uppercase only. You may be right that it wasn't ASCII's fault. Perhaps the first version of ASCII made sure to encompass what had been, and then saner heads said "let's allow for future progress".

i'm not going to explore the entire history, but just looked this up. TL;DR example, the addition of lowercase characters represented a jump from 6 bits to 7 bits at the hardware level:

"A six-bit character code is a character encoding designed for use on computers with word lengths a multiple of 6. Six bits can only encode 64 distinct characters, so these codes generally include only the upper-case letters, the numerals, some punctuation characters, and sometimes control characters. The 7-track magnetic tape format was developed to store data in such codes, along with an additional parity bit."

https://en.wikipedia.org/wiki/Six-bit_character_code

spullara · on Sept 1, 2022

This was the original mistake. Encoding the font into the letter representation.

samatman · on Sept 1, 2022

Capital letters aren't a matter of font. There's a difference between the river phoenix, a magical bird which lives by the river River, and River Phoenix, the actor. It isn't a presentation-layer difference, it's an encoding-layer difference.

Dagonfly · on Sept 1, 2022

From wikipedia: "A capitonym is a word that changes its meaning (and sometimes pronunciation) when it is capitalized."

spullara · on Sept 2, 2022

All bets are off at the beginning of a sentence.

pjc50 · on Sept 1, 2022

Possibly, but this dates back to .. maybe the 7th century and Carolingian Minuscule?

koheripbal · on Sept 1, 2022

Then again, if we could start from scratch we'd probably just have a single global phonetic language without case and with a limited number of total chars.

saalweachter · on Sept 1, 2022

Honestly non-phonetic glyphs are probably an easier lift.

Fun fact: the reason we pronounce "ph" like "f" is because the Greek letter was originally pronounced like p-h, at the time Romans began stealing words, but then the Greeks started pronouncing it like "f" and the Romans followed suit, but kept the old Latinization of "ph", because they'd already carved it into stone.

English spelling is largely phonetic... but it captures the phonetic spelling across dozens or hundreds of shifts in the spoken language. Unless you can stop people from changing how they speak, any phonetic spelling reboot is either going to suffer from the same problem, or words will constantly change how they are spelled to keep up with the spoken word.

koheripbal · on Sept 3, 2022

Kids learn phonetic writing much quicker than glyph writing. It's much more intuitive for a child.

It also helps to limit the fragmentation of pronunciation of words.

BeFlatXIII · on Sept 1, 2022

China has the proper idea for a writing system this entire time.

paulmd · on Sept 1, 2022

if you view meme culture as a trend towards increased symbol density in linguistic communication due to their ability to convey emotions, overtones, implications, and other nuance ("shaka, when the walls fell") then the increased symbol space of chinese/japanese/korean characters looks interesting.

conversely it's certainly been an obvious disadvantage (posed a lot of problems and imposed a lot of awkward workarounds) for mechanical/electronic communication - now you have to enter the characters too, and you have to express that larger number of characters efficiently. In practice, a lot of electronic communication is just simplified to ASCII because that's the set that works universally. Someone used the example of ess-tset being transliterated as "ss" in german, dunno if chinese uses anything similar, but it wouldn't surprise me, obviously Japanese has romaji too.

but at a human-interface level, fundamentally there is a limit to how many symbols people can absorb. Even with latin characters, people at best will sight-read whole words to increase symbol rate, but, the natural evolution is to use 1 character to represent 1 symbol/word, that's the highest possible rate at which humans can absorb symbols for a written system. And in turn you could in principle absorb a "word of symbols", which is a sentence, similar to how western readers can sight-read a word of our 1-character glyphs.

by "increasing the dimensionality" of the symbol, you increase the effective symbol rate, similar to how memes use subtext/etc to convey more nuance than a pure text can by itself.

cm2187 · on Sept 1, 2022

I think anyone who deals with end users would disagree. It seems impossible to get users to abide to a specific casing. Things would break all the time.

umanwizard · on Sept 1, 2022

> as a sender you should always treat them as unique emails

This is already how it works. Senders are not supposed to assume that local-parts are case-insensitive. (Some buggy implementations ignore this requirement and upper-case everything, but the serious implementations don’t).

msh · on Sept 1, 2022

Do you have any data to back that most languages dont have cases?

samatman · on Sept 1, 2022

The correct answer is "who cares" though. Languages which use cased alphabets.. use cased alphabets, you don't get to argue with it.

You also don't get to argue with the fusional position changes in Arabic, or the ligatures in Devanagari, or the places within a square the featural particles of Hangul must be printed in.

These things aren't negotiable.

msh · on Sept 1, 2022

You are correct that its not negotiable when supporting that language, but it is negotiable what languages and writing sets a given application support.

bloak · on Sept 1, 2022

I think you have correctly identified an implausible claim!

Of course, most languages aren't written at all ... or at least don't have a traditional written form that is sufficiently well established for someone to say that the "language" has case rather than a particular (proposed) way of writing it.

However, I rather suspect that the majority of languages in which books are published use some variant of the Latin alphabet and do, therefore, have case. (The only language I've heard of that uses the Latin alphabet without case is Lojban!)

On the other hand, if you weight languages by the number of (native) speakers, since about three quarters of the world's population lives in China, India, Pakistan, Bangladesh, Japan or Korea, probably it's true that most people don't use case in their main language.

gumby · on Sept 1, 2022

Most alphabets don’t have case (Devanagari, Arabic, etc) and many don’t have letters at all (Chinese, Japanese, et al).

Case is only used in a few alphabets, mainly Latin-, Cyrillic-, and Greek-derived ones.

robertlagrant · on Sept 1, 2022

It's just blown my mind that case might be a thing non-English speakers would need to learn to be able to read and write English. (Same for non-English, but that doesn't blow my mind in the same way.)

tsimionescu · on Sept 1, 2022

Yeah, we always say that the English alphabet has 26 letters, but there are actually 52 unique symbols you have to learn to read, or 104 if you also have to read/write cursive. Some of these symbols are very similar (if you learn 'o' you will definitely recognize 'O', and likely the cursive variants as well), while others are quite different ('g', 'G', and the cursive upper case G might as well be different letters altogether; the lower-case cursive does resemble 'g').

gumby · on Sept 1, 2022

With joined-up letters (“cursive” in the USA I guess) different languages have different letterforms, and sometimes multiple systems.

For that matter typesetting rules vary by language as well — not just the obvious hyphenation rules busnspacimg as well. Just pick up a book in, say, French or Russian and you can tell at a glance (without even looking at the letters) that it’s not in English.

tsimionescu · on Sept 1, 2022

Right, 104 symbols would be the minimum if you do need to read/write cursive.

However, I don't agree with your point about typeset text. You're right that the styles differ, but if you have learned one style, and know the language of the text, you will not need any significant amount of time to read a different style of typesetting.

Russian of course normally uses the Cyrillic alphabet, not the Latin one, so obviously you do have to learn a whole new set of symbols to understand it even if you can read Latin symbols. And of course French uses slightly more letters/letter forms than English, with the sedile and four accents (egu, grave, circonflex, and very rare treme).

gumby · on Sept 1, 2022

Lots of accents when using Cyrillic to write non-Russian text.

I didn’t mean the typesetting differences made reading a different language in any way hard, merely pointing out that there are lots of different aspects to text in different languages even when the alphabets are basically the same.

ghaff · on Sept 1, 2022

And there are a fair number of (inconsistent) rules for casing. Proper nouns vs. common nouns. Camel case (or other non-standard capitalizations). Title case. "Standard" body copy.

msh · on Sept 1, 2022

Are they a majority of languages if counted? I guess it also matters if you count the number of languages or if you count the number of people writing them.

gumby · on Sept 1, 2022

Let’s just use Greek and its descendants (Latin, Greek and cyrillic alphabets) and Brahmic-derived writing (we said “alphabet” when I was a kid but now ppl say “ Abugida“. There are about 200 languages spoken in Europe, all of which use these alphabets. India has over a hundred “major” languages and about 1600 others, most of which use Bramic writing alphabets (the major exception, Urdu, uses a form of Arabic writing). So a big imbalance!

Oh, you want speakers? merely counting people who read Hanzi + Arabic-Alphabet readers + the Indian subcontinent gets you more half the world‘s population. And there are hundreds, maybe over a thousand writing systems.

bmn__ · on Sept 1, 2022

> in Europe, all of which use these alphabets

Latin is much more wide-spread than that! You neglect whole of America and Australasia and half of Africa.

https://upload.wikimedia.org/wikipedia/commons/9/9d/Writing_...

> So a big imbalance!

When we sum up realistically, then world-wide the amount of users of writing systems with case/"cameral" are about equally balanced with those without.

coffeeling · on Sept 1, 2022

I would probably count stuff like abugidas and kana as letters in that context.

umeshunni · on Sept 1, 2022

Only latin and Cyrillic scripts (or more broadly, alphabetic systems) have cases. Abjads, abugidas and logo/syllabic systems don't have cases.

wereHamster · on Sept 1, 2022

tolower() works mostly well in ASCII and languages which don't have weird rules (=works mostly well in english).

stouset · on Sept 1, 2022

I think that's GP's point: tolower() looks like it works well to English speakers but it's subtly wrong and will fail unexpectedly for people with other locales.

3np · on Sept 1, 2022

The context is e-mail addresses, not English.

gumby · on Sept 1, 2022

Those rules are only weird to you, but perfectly reasonable to others, such as Turkish or German speakers…well, readers :-)

msh · on Sept 1, 2022

Internalization is hard. I think its too much to expect software written for a specific market to handle all languages in the world.

Fx in danish we have 3 letters (æøå) that is not common in the latin alphabet. I cant go to germany or turkey and expect people to be able to write out those letters when doing input in a local system.

MandieD · on Sept 1, 2022

Fun thing I've run into in a Germany-based but increasingly international company: German always spells out umlauts and eszetts when going to lower ASCII for email addresses ("Schäßler" -> "Schaessler"), but Hungarian does not. Not sure how Turkish ö and ü get fully lower-ASCII-ized there, but in Germany, they get spelled out "oe" and "ue" as if they were German ö and ü. This isn't as much of a corner case as one might think - there are a lot of people with Turkish names in Germany.

gumby · on Sept 1, 2022

I’m always amused by these kinds of nonsensical usage for Turkish in Germany (ü->ue) but the thing that really trips people up is that Turkish has two letters that look like i, one with the tittle and one without — in both cases.

Germany relatively adopted an uppercase ß (and got it into Unicode) to try to help with case roundtripping but I’ve never seen it in the wild. And let’s not get into obsolete German Fraktur ligatures like tz or ch which also had no upper case equivalents.

bmn__ · on Sept 1, 2022

> Germany relatively adopted an uppercase ß (and got it into Unicode)

The parenthetical part is true. It was an uphill battle, but not because of the consortium, but because of what the tropes wiki would describe as executive meddling.

The adoption is not recent, but about 110 years old. You have the wrong idea because of sloppy journalism.

> I’ve never seen it in the wild

I see it all the time. Maybe you are undercounting. Pay attention to non-standard letterforms on hand-written signs, and you also have to include print media where someone substituted lower-case ß in absence of a glyph in a font. This is a typographic mistake, but the intent is clear.

Dylan16807 · on Sept 1, 2022

> to try to help with case roundtripping

I don't think that's right. It's used in certain contexts like all caps.