Google Drive flags macOS '.DS_Store' files for copyright violation

notRobot · on Feb 19, 2022

Stop throwing poorly implemented and untested ML/algorithms at every problem, dammit. It just leads to problem after problem.

I dunno why, but the people working at tech companies really seem to believe that computers can do no wrong. Which is obviously not true.

And then they don't even have an appeals process where a human with basic common sense can see what went wrong. Nope. You just lose access to all your stuff.

_fat_santa · on Feb 19, 2022

The author speculates that this was just a hash collision. Seems like the most plausible scenario in this case.

m-p-3 · on Feb 19, 2022

Seems like only comparing a checksum is a recipe for disaster, especially when hosting an enormous amount of files. Comparing checksum (or multiple checksums) and file size would drastically reduce the amount of false-positives.

DoctorOW · on Feb 20, 2022

Googling DS_Store it contains the folder customization metadata which backs up my first instinct. These are almost certainly identical byte for byte recreations of DS_Store with probably default settings.

aasasd · on Feb 19, 2022

Neither hashes nor file sizes are how copyright works—I don't make a work free by adding a space to it. So it would be a bit weird if Google implemented it that way. Rolling hashes—maybe.

It's a pity that we don't get to see the supposed ‘copyright holder’ like on YouTube—it would add nicely to the bureaucratic surrealism.

I wonder if the Docs team saw how great and flawlessly YouTube's ‘Content ID’ works, and implemented about the same. And now they match files against some clerk's entire disk that was uploaded into the system.

notRobot · on Feb 19, 2022

Hey, maybe don't do hash checks on files less than 1KB in size? It's not difficult to solve.

gruez · on Feb 19, 2022

What hash are they using? CRC32/CRC64? Even md5 shouldn't produce collisions unless you're actively hunting for them.

notreallyserio · on Feb 19, 2022

I'd be kind of shocked if that was the case. Either it means that Google isn't also comparing file sizes or that whoever is creating these tiny DS files/one-byte files is incredibly lucky matching tiny copyrighted material.

LorenPechtel · on Feb 19, 2022

Then use a bigger hash!

m-p-3 · on Feb 19, 2022

> but the people working at tech companies really seem to believe that computers can do no wrong.

They confuse the computer being wrong with making a mistake.

The computer doesn't make a mistake, it does as it's told, but that doesn't mean what it's told to do isn't wrong.

hypertele-Xii · on Feb 20, 2022

You seem to be under the false belief that computers are actually infallable. They're not. Cosmic rays routinely flip bits in computers' memories, causing all kinds of one-off bugs and mistakes to happen.

kop316 · on Feb 19, 2022

About a month ago, it was found files containing a "1" were flagged too.

https://news.ycombinator.com/item?id=30060405

Terretta · on Feb 19, 2022

From the article:

Last month, Google Drive users were left baffled on seeing their nearly empty files being erroneously flagged for violating the company's copyright infringement policy.

These text files contained nothing other than numbers like 0, 1, 173, 174, 186, and a few others.

Ansil849 · on Feb 19, 2022

Did Google ever issue a public explanation of what caused this, let alone an apology?

arch-ninja · on Feb 19, 2022

Would a "tax on bad decisions made by company & company's tools" be appropriate in the 21st century? I don't think you can sue an algorithm these days, but having the legal step to say that "whatever a company's tool decides, the company has also (legally) decided" would help solve many of these problems. You'd have to run these by legal before shipping.

colejohnson66 · on Feb 19, 2022

The problem isn’t necessarily this. ML based moderation is basically required at this scale. The problem is that the appeals processes don’t work, and the companies do nothing to fix it; they always treat the computer as infallible.

blitzar · on Feb 20, 2022

Imagine this being the last strike on your account and a computer somewhere deletes your files, emails, videos etc then bans you for life and with no explanation, except violated one of the 1000's of ToS and no chance of human review ever ...

aaomidi · on Feb 19, 2022

Honestly wish Google stuck to being reactive about copyright violation, rather than proactive.