Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Drive flags macOS '.DS_Store' files for copyright violation (bleepingcomputer.com)
95 points by ai_ja_nai on Feb 19, 2022 | hide | past | favorite | 18 comments


Stop throwing poorly implemented and untested ML/algorithms at every problem, dammit. It just leads to problem after problem.

I dunno why, but the people working at tech companies really seem to believe that computers can do no wrong. Which is obviously not true.

And then they don't even have an appeals process where a human with basic common sense can see what went wrong. Nope. You just lose access to all your stuff.


The author speculates that this was just a hash collision. Seems like the most plausible scenario in this case.


Seems like only comparing a checksum is a recipe for disaster, especially when hosting an enormous amount of files. Comparing checksum (or multiple checksums) and file size would drastically reduce the amount of false-positives.


Googling DS_Store it contains the folder customization metadata which backs up my first instinct. These are almost certainly identical byte for byte recreations of DS_Store with probably default settings.


Neither hashes nor file sizes are how copyright works—I don't make a work free by adding a space to it. So it would be a bit weird if Google implemented it that way. Rolling hashes—maybe.

It's a pity that we don't get to see the supposed ‘copyright holder’ like on YouTube—it would add nicely to the bureaucratic surrealism.

I wonder if the Docs team saw how great and flawlessly YouTube's ‘Content ID’ works, and implemented about the same. And now they match files against some clerk's entire disk that was uploaded into the system.


Hey, maybe don't do hash checks on files less than 1KB in size? It's not difficult to solve.


What hash are they using? CRC32/CRC64? Even md5 shouldn't produce collisions unless you're actively hunting for them.


I'd be kind of shocked if that was the case. Either it means that Google isn't also comparing file sizes or that whoever is creating these tiny DS files/one-byte files is incredibly lucky matching tiny copyrighted material.


Then use a bigger hash!


> but the people working at tech companies really seem to believe that computers can do no wrong.

They confuse the computer being wrong with making a mistake.

The computer doesn't make a mistake, it does as it's told, but that doesn't mean what it's told to do isn't wrong.


You seem to be under the false belief that computers are actually infallable. They're not. Cosmic rays routinely flip bits in computers' memories, causing all kinds of one-off bugs and mistakes to happen.


About a month ago, it was found files containing a "1" were flagged too.

https://news.ycombinator.com/item?id=30060405


From the article:

Last month, Google Drive users were left baffled on seeing their nearly empty files being erroneously flagged for violating the company's copyright infringement policy.

These text files contained nothing other than numbers like 0, 1, 173, 174, 186, and a few others.


Did Google ever issue a public explanation of what caused this, let alone an apology?


Would a "tax on bad decisions made by company & company's tools" be appropriate in the 21st century? I don't think you can sue an algorithm these days, but having the legal step to say that "whatever a company's tool decides, the company has also (legally) decided" would help solve many of these problems. You'd have to run these by legal before shipping.


The problem isn’t necessarily this. ML based moderation is basically required at this scale. The problem is that the appeals processes don’t work, and the companies do nothing to fix it; they always treat the computer as infallible.


Imagine this being the last strike on your account and a computer somewhere deletes your files, emails, videos etc then bans you for life and with no explanation, except violated one of the 1000's of ToS and no chance of human review ever ...


Honestly wish Google stuck to being reactive about copyright violation, rather than proactive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: