Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror

Comment Re:Two things (Score 1) 25

(require bots to honor robots.txt.

'robots.txt' was invented in 1994 - 31 years ago; it has done well but requirements are different these days, it needs extending for AI but not just AI.

* 'User-agent' means that a site needs to know all the names that spiders use to identify themselves. This is hard and cumbersome. 'Crawler' should be possible, values eg:

** 'web-index' - eg google to allow someone to search

** 'AI' - eg ChatGPT

There are prolly several others.

* 'purpose' what can the spider do with the information ? Values eg:

** 'full-index' something like google could keep and serve up the content

** 'show-amount n' like full-index but only show nbytes - this would mean that the user would have to visit the web site to real all of the text. This is something that news sites would like.

** 'train' Use for training AI

** 'fee $n' if this is downloaded a fee of $n must be paid. If the spider operator considers this too much then do not download it. This will have spider owners to wail and gnash their teeth

** 'rate-limit n' do not access the web site more than every n seconds

** 'view region' only view the content if the spider is in geographic region

OK: the above is a first draft and can be improved a lot but is a start. Enforcing this is another matter.

Comment How about AI vs Disney ? (Score 2) 100

Disney is ferocious in its protection of its copyright. What happens if you ask an AI about Mickey Mouse, what does it say ? How did that AI learn about MM but by reading copyrighted material or viewing copyrighted movies ?

Has Disney said anything about AI companies using its copyrighted material ?

Comment Re:"Respecting copyright" != "Ethically" (Score 1) 100

Let's just run with the "AI training" misconception.

There's a document. It was created by an author. The author has the exclusive right to copy the document in its entirety onto his own website (copy=1,violation=0). Your browser knocks on the door. It asks for the document. The author's website copies the document over the network onto your browser's process memory (copy=2,violation=0). That's fine, because the author's HTTP server initiated and the author intended to authorize the copy.

After that thing get murky, several copies are made but to what purpose ?

Web browser copies it into its cache on disk (used,eg, if you do a page refresh avoid downloading again over the Internet). Is this a legal copy ? This is a standard browser thing. Other similar copies might be made, eq by squid (a caching and forwarding HTTP web proxy). I will ignore these copies as no one seems to be upset about them. (Actually that is not entirely true.)

You read the document, another copy is made that resides in your brain's memory. Is this legal ? It is not mentioned in copyright legislation but it is implied as part of the intended use of the document so that you may learn whatever the document talks about. If a friend is sitting next to you I can see no legislation that prevents s/he from also reading it - thus several copies might be made.

What if, instead, the web browser does not present it to a human but to an artificial human (an AI) then a copy will be made in the AI's memory. It is this copy that is being objected to. What is the difference between a copy being held in grey matter and one in silicon ?

Or is the disagreement not how it is held but the use to which it will be put ? An AI will, presumably, be used for commercial gain, is that the problem ? But if I read a book about Python is that not commercial gain if I get a job as a Python programmer ?

Or is it that the AI might further disseminate the knowledge. We might be getting somewhere here as the Disney Corporation has objected to parents who sing their copyrighted "Happy Birthday" song at their kids' parties.

Whatever. The point is not the copies but the purpose of the copies. This is what needs to be discussed.

I think that we need an enhancement to robots.txt where the web site (copyright owner) can say to what purposes copies may be made. All mechanical readers (ie all but humans) would be obliged to obey it. This would add little overhead, indeed search engine spiders already do so. What uses: web indexing; AI learning; quoting of small sections; quoting of entire document; ... The list of different uses needs discussion. I have zero faith that the AI cowboys would take any heed - they are entitled and seem to think that the world owes themselves a living.

Comment Re: Learning your IDE is more effective ... (Score 1) 189

He means that they have learned a few basic, simple Vim commands and then stopped learning. The result is that they can get the job done but it takes a lot more time. If you use a tool a lot continue learning how to use it better, the investment will repay you many times over.

Comment "Valued" or "Worth" ? (Score 2) 33

These unicorns have been estimated to have a value of over $1B at an early stage in their life cycle when their income has been small - this valuation is based on speculation as to what might happen in the future. These companies will only be worth over $1B once they have a solid product that is selling well to many customers.

Comment What are the security implications ? (Score 2) 44

OK: we know that google reads every email that goes through gmail; I kind of assumed that that was to work out what advertising it could plague the gmail user with. Does it go beyond that, building profiles that are used elsewhere such as selling to life insurance (build risk profiles). Will AI lead to further leakage ?

Comment So if entry level workers ... (Score 4, Insightful) 36

cannot gain employment to become more experienced where do companies recruit their middle experienced workers from ?

I suspect that they will try to poach them from other companies or other countries that have not replaced new intake workers with AI. I can see this causing a big headache in a few years time.

Comment Re:Unicode is a bug (Score 1) 69

If it ain't ascii it isn't worth expressing in bytes.

If you exclusively speak American then you can say everything is US ASCII ... but for many who, reasonably, want to express themselves in their own language they will want other characters. But the "everything" is not entirely true even for Americans, eg 1/100 of a dollar is a cent which is U+00A2 - which slashdot will not display correctly.

Slashdot Top Deals

** MAXIMUM TERMINALS ACTIVE. TRY AGAIN LATER **

Working...