Stop throwing poorly implemented and untested ML/algorithms at every problem, dammit. It just leads to problem after problem.
I dunno why, but the people working at tech companies really seem to believe that computers can do no wrong. Which is obviously not true.
And then they don't even have an appeals process where a human with basic common sense can see what went wrong. Nope. You just lose access to all your stuff.
Seems like only comparing a checksum is a recipe for disaster, especially when hosting an enormous amount of files. Comparing checksum (or multiple checksums) and file size would drastically reduce the amount of false-positives.
Googling DS_Store it contains the folder customization metadata which backs up my first instinct. These are almost certainly identical byte for byte recreations of DS_Store with probably default settings.
Neither hashes nor file sizes are how copyright works—I don't make a work free by adding a space to it. So it would be a bit weird if Google implemented it that way. Rolling hashes—maybe.
It's a pity that we don't get to see the supposed ‘copyright holder’ like on YouTube—it would add nicely to the bureaucratic surrealism.
I wonder if the Docs team saw how great and flawlessly YouTube's ‘Content ID’ works, and implemented about the same. And now they match files against some clerk's entire disk that was uploaded into the system.
I'd be kind of shocked if that was the case. Either it means that Google isn't also comparing file sizes or that whoever is creating these tiny DS files/one-byte files is incredibly lucky matching tiny copyrighted material.
You seem to be under the false belief that computers are actually infallable. They're not. Cosmic rays routinely flip bits in computers' memories, causing all kinds of one-off bugs and mistakes to happen.
Last month, Google Drive users were left baffled on seeing their nearly empty files being erroneously flagged for violating the company's copyright infringement policy.
These text files contained nothing other than numbers like 0, 1, 173, 174, 186, and a few others.
Would a "tax on bad decisions made by company & company's tools" be appropriate in the 21st century? I don't think you can sue an algorithm these days, but having the legal step to say that "whatever a company's tool decides, the company has also (legally) decided" would help solve many of these problems. You'd have to run these by legal before shipping.
The problem isn’t necessarily this. ML based moderation is basically required at this scale. The problem is that the appeals processes don’t work, and the companies do nothing to fix it; they always treat the computer as infallible.
Imagine this being the last strike on your account and a computer somewhere deletes your files, emails, videos etc then bans you for life and with no explanation, except violated one of the 1000's of ToS and no chance of human review ever ...
I dunno why, but the people working at tech companies really seem to believe that computers can do no wrong. Which is obviously not true.
And then they don't even have an appeals process where a human with basic common sense can see what went wrong. Nope. You just lose access to all your stuff.