Data Scraping by AI Startup Anthropic Causes Unrest Among Web Publishers

The emerging AI startup Anthropic has come under criticism for relentlessly collecting data from websites to train its systems, potentially violating publishers' terms of service. These allegations are made by several affected companies. To develop large language models, the technological foundation of chatbots like OpenAI's ChatGPT and Anthropic's counterpart Claude, AI developers rely on vast amounts of data from a variety of sources. Anthropic, founded by former OpenAI researchers, aims to develop 'responsible' AI systems. Criticism came, among others, from Matt Barrie, CEO of Freelancer.com, who described the San Francisco-based company as the 'most aggressive scraper' of his platform, which records millions of daily visits. According to Barrie, a web-based 'crawler' linked to Anthropic generated 3.5 million visits to his website within four hours—the fivefold of the next most frequent AI crawler. Attempts to deny access using standardized protocols were unsuccessful, prompting Barrie to block all IP addresses associated with Anthropic. Besides Freelancer.com, other website operators also reported increased access by Anthropic crawlers. Kyle Wiens, CEO of iFixit.com, reported one million accesses within 24 hours, triggering all overload alarms. iFixit explicitly prohibits the use of its data for machine learning in its terms of service. One approach to controlling web robots is the 'robots.txt' protocol, which is, however, based on voluntary compliance. Anthropic emphasized that their crawlers respect these signals once implemented and strive for minimal disruptions. They also stated that they consider technologies like CAPTCHAs to protect against abuse. The topic of data scraping is not new but has gained significant intensity due to the race for advanced AI models, leading to additional costs for website operators. Eric Holscher, co-founder of the documentation platform Read the Docs, quantified the resulting bandwidth costs and the time spent combating abuse as significant. Although Anthropic has positioned itself as an ethical player, it apparently does not have comparable partnerships to OpenAI, which recently made agreements with Reddit, The Atlantic, and the Financial Times to use data legally. Web publishers are advocating for a more intensive examination of data scraping practices to allow a consensual use of their content and ensure the long-term benefits of AI development.

EULERPOOL DATA & ANALYTICS

Make smarter decisions faster with the world's premier financial data

Eulerpool Data & Analytics

Eulerpool News·
7/27/2024

Make smarter decisions faster with the world's premier financial data

New

Bolsonaro mobilizes supporters in Sao Paulo against social media ban

Lee Carsley: Between Football Passion and Public Pressure

Thousands of Brazilians Protest against Supreme Court Ruling: Discontent over Suspension of Elon Musk's Platform X

Red Lobster receives green light for bankruptcy protection and strategic realignment

China's Service Sector: Growth Slowed in August Despite Summer Travel Boom

Warren Buffett continues trend of selling: Apple and Bank of America in focus

Starmer condemns Tories' handling of the NHS as 'unforgivable'

Tesla in the Spotlight: An Analysis of Opportunities and Risks

Costco Wholesale: Emerging Dividend Policy Despite Low Yield

Caution against Viral Financial Tricks: The Reality Behind the "Chase ATM Glitch

Data Scraping by AI Startup Anthropic Causes Unrest Among Web Publishers

Eulerpool News·7/27/2024

Make smarter decisions faster with the world's premier financial data

New

Eulerpool News·
7/27/2024