Let's just run with the "AI training" misconception.
There's a document. It was created by an author. The author has the exclusive right to copy the document in its entirety onto his own website (copy=1,violation=0). Your browser knocks on the door. It asks for the document. The author's website copies the document over the network onto your browser's process memory (copy=2,violation=0). That's fine, because the author's HTTP server initiated and the author intended to authorize the copy.
After that thing get murky, several copies are made but to what purpose ?
Web browser copies it into its cache on disk (used,eg, if you do a page refresh avoid downloading again over the Internet). Is this a legal copy ? This is a standard browser thing. Other similar copies might be made, eq by squid (a caching and forwarding HTTP web proxy). I will ignore these copies as no one seems to be upset about them. (Actually that is not entirely true.)
You read the document, another copy is made that resides in your brain's memory. Is this legal ? It is not mentioned in copyright legislation but it is implied as part of the intended use of the document so that you may learn whatever the document talks about. If a friend is sitting next to you I can see no legislation that prevents s/he from also reading it - thus several copies might be made.
What if, instead, the web browser does not present it to a human but to an artificial human (an AI) then a copy will be made in the AI's memory. It is this copy that is being objected to. What is the difference between a copy being held in grey matter and one in silicon ?
Or is the disagreement not how it is held but the use to which it will be put ? An AI will, presumably, be used for commercial gain, is that the problem ? But if I read a book about Python is that not commercial gain if I get a job as a Python programmer ?
Or is it that the AI might further disseminate the knowledge. We might be getting somewhere here as the Disney Corporation has objected to parents who sing their copyrighted "Happy Birthday" song at their kids' parties.
Whatever. The point is not the copies but the purpose of the copies. This is what needs to be discussed.
I think that we need an enhancement to robots.txt where the web site (copyright owner) can say to what purposes copies may be made. All mechanical readers (ie all but humans) would be obliged to obey it. This would add little overhead, indeed search engine spiders already do so. What uses: web indexing; AI learning; quoting of small sections; quoting of entire document; ... The list of different uses needs discussion. I have zero faith that the AI cowboys would take any heed - they are entitled and seem to think that the world owes themselves a living.