If you meant transcribing dialogue from a TV show is violating copyright, I'm not so sure, it's relatively common to quote dialogue for varied purposes, ex. TV critics
Definitely understand if you're saying the whole dialogue for a TV show is copyrighted, but I'm curious about the opensubtitles part, used to work in that area.
Quoting excerpts is different from transcribing an entire work, which is unambiguously copyright infringement. (Otherwise you would find the “book” version of any and all TV shows on Amazon.) The subtitles in question are generally translations, which likewise fall under copyright, being a derived work.
Yeah, I was just curious about the opensubtitles site because I used to work in that field (subtitles) and wasn't sure if there were some new pirate sites that were monetizing subs.
n.b. not being argumentative, please don't read it that way, I apologize if it comes off that way:
Not every derived work is a copyright violation, that's why subs and dubs don't get kicked around, you can quote dialogue in an article, etc.[^1]
Answering if it applies to AI is playing out in court currently with ex. NYT v. OpenAI[^2] and Sarah Silverman et al v. OpenAI[^3] and v. Meta.[^4]
I wish it had included the books3, but it doesn't anymore. I wish it was possible to download that 36GB books3.tar in the wild these days. Herewith, I promise to use this dataset according to the "fair use" only...
I know. But here where I am, using torrent means participate in distribution of the content and that is where I'll get huge bill for illegally sharing this file.
not the domain per se, but the high-powered law firms at your fingertips. Copyright law is much easier to enforce against working-class parents of 12-year-olds than SV elites.
Throw a dart at a wall filled with every notable author/publisher ever and whoever you hit probably owns some of this data.
Apparently you can just do whatever as long as you say it's for AI research, go post Blu-ray rips online, it's fine provided you have a .ai domain :^)