"others argued that Bluesky data is publicly available anyway and so the dataset is fair use" OTA TV and radio are publicly available. Try recording and distributing it, see how far "fair use" gets you.
And it is fair use. Anyone can watch/listen to TV/radio/online video/podcast and use the information learned to write/create more data, as long as they don't directly copy verbatim the data. If you find an AI spewing out your data word for word, you have every right to sue the company in control of that AI, just as you would have every right to sue an individual or corporation who directly copied your data and passed it on without crediting you.
I really don't understand how people can't grasp this. Our entire civilization is founded upon the ideas and written knowledge of our predecessors. This is the whole basis of our schooling system -- to teach the knowledge of others by sharing their "data" with students.
Public data on a privately-owned website? Yeah, that's not public data.
Sorry, if your data is viewable by the public, either by posting on the internet by you, allowing a public library to digitally loan out, or any other means, your data is available to the public to access and learn from. If you don't agree with that, don't publish or allow your data to be viewed by the public.
The Russian economy is on the verge of collapsing, and there are rumors that it may be too late to stop it from doing so, even if Putin were to end the war today. Just because you don't see the results of sanctions in a month or two, or however little time you think it should take, doesn't mean sanctions don't work. It takes time, sometimes a year or more, for the results to fully manifest.
Think of someone with a few credit cards and a bit of savings in the bank who just lost their job. They may be able to go on for a few months, or even a year or more, before they ran out of available funds and end up on the streets.
It's public data, available for anyone, including AI bots, to peruse and learn from at their will. All this hubbub about AI stealing my shit is just that -- shit. AI, just like anyone, should have the right to view/read/scan any publicly available data, including copyrighted data if available publicly, to learn and grow. What it should not be able to do, just like real people cannot do, is plagiarize that data by using word for word quotes without proper citations. Authors/creators of data have the right to go after plagiarizing AI, just as they do with plagiarizing humans, if they find their work used without proper credit.
Again, if your work is out there for others to freely access and learn from, then those who can learn from it include AI. If you don't like it, don't publicly publish your work.
creative professionals need to strike!
No, "creative professionals" need to get over the fact that they can easily be replaced by technology. You don't hear the pen and paper people complaining about digital drawing and painting technology.
AI depends on human input, and it tends to fall down hard if it tries to train on AI output.
A five-year-old would fail at generating a simple sentence if it were trained on the work of other five-year-olds, so what's your point here?
Suure. That's why GenAI has to have guardrails on the prompting system to say things like "don't reproduce a trademark" then? Fuck your ridiculous dishonesty. The "patterns and structures" are just high-end lossy compression.
You think actual people don't need laws stating things like "don't reproduce a trademark" as well? The laws are on the books because people will plagiarize and copy trademarks without them.
A quarrel is quickly settled when deserted by one party; there is no battle unless there be two. -- Seneca