3

AI is killing the grand bargain at the heart of the web. 'We're in a different w...

 9 months ago
source link: https://finance.yahoo.com/news/ai-killing-grand-bargain-heart-090001830.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

AI is killing the grand bargain at the heart of the web. 'We're in a different world.'

Kali Hays,Alistair Barr
Wed, August 30, 2023, 6:00 PM GMT+9·11 min read
52d91f39310256db16036baf592f5cd9
AI is killing the grand bargain at the heart of the web. 'We're in a different world.'
AI screenwriter
AI screenwriterMoor Studio/Getty Images Plus
  • Content owners are wising up to their work being freely used by Big Tech to build new AI tools.

  • Bots like Common Crawl are scraping and storing billions of pages of content for AI training.

  • With less incentive to share online freely, the web could become a series of paywalled gardens.

AI is undermining the web's grand bargain, and a decades-old handshake agreement is the only thing standing in the way.

A single bit of code, robots.txt, was proposed in the late 1990's as a way for websites to tell bot crawlers they don't want their data scraped and collected. It was widely accepted as one of the unofficial rules supporting the web.

At the time, the main purpose of these crawlers was to index information so results in search engines would improve. Google, Microsoft's Bing and other search engines have crawlers. They index content so it can be later served up as links to billions of potential consumers. This is the essential deal that created the flourishing web we know today: Creators share abundant information and exchange ideas online freely because they know consumers will visit and either see an ad, subscribe, or buy something.

Now, though, generative AI and large language models are changing the mission of web crawlers radically and rapidly. Instead of working to support content creators, these tools have been turned against them.

The bots feeding Big Tech

Web crawlers now collect online information to feed into giant datasets that are used for free by wealthy tech companies to develop AI models. CCBot feeds Common Crawl, one of the biggest AI datasets. GPTbot feeds data to OpenAI, the company behind ChatGPT and GPT-4, currently the most powerful AI model. Google just calls its LLM training data "Infiniset," without mentioning where the vast majority of the data comes from. Although 12.5% comes from C4, a cleaned up version of Common Crawl.

The models use all this free information to learn how to answer user questions immediately. That's a long way from indexing a web site so users can be sent through to the original work.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK