Comment Internet Archive / Google Cache (Score 1) 7
Google's cached webpages are no longer available for individual consumption, and it's dawning on me it may be for this very reason. By keeping their cache private they are keeping the value of it (IE pre-AI era) for themselves. Perhaps for use in training since it hasn't been corrupted with AI generated content.
The Internet Archive is a great thing, but it's by no means browsable in any useful way. It's mainly for viewing a specific page's archive, with gaps and disassociations between the specific times things were crawled.
I'm wondering what it would take to identify and label "pure" internet content that at the very least already existed prior to AI and is unchanged. You know how a browser tells you a website is "secure" and certified, etc? How about one that indicates that the page's content pre-existed AI. It could grab the page from the Internet Archive, generate a checksum, and compare it to the live version. Lots of caveats of course, like filtering dynamic content (ads and the like) that do change.