Reddit is taking a firm stand against unauthorized scraping of its content by AI companies, particularly targeting tech giants like Microsoft. The popular social platform, known for its vast amount of user-generated content, refuses to allow companies to freely use its data for training AI models without proper permission. Steve Huffman, Reddit’s CEO, recently spoke out about the platform’s position, expressing frustration over the unapproved use of Reddit’s data.

What’s Happening & Why This Matters

In recent months, Reddit has intensified its efforts to control how AI firms utilize its data. After discovering that Microsoft had been scraping Reddit’s content without notifying the platform, Reddit moved quickly to block this activity. Microsoft not only used the data for AI features in Bing but also resold it to other search engines via Bing’s APIs, a practice that Reddit found particularly problematic.

Huffman described the process of blocking Microsoft’s scraping efforts as challenging and time-consuming. Reddit has also taken similar actions against other AI firms like Perplexity AI and Anthropic, which attempted to access Reddit data without securing proper agreements.

These actions reflect Reddit’s strategy to maintain control over how its content is used, especially in the context of AI training. Earlier this year, Reddit secured a $60 million licensing deal with Google and another agreement with OpenAI. However, individual Reddit users whose posts are used do not receive any direct benefit; Reddit retains control over the data and its usage.

6 common training model challenges. credit: Oracle

Reddit’s stance has sparked a wider discussion about what qualifies as “publicly available” data and whether tech companies have the right to scrape and use this data without explicit permission. Companies like Apple and Salesforce have previously claimed that their use of publicly available data for AI training is legitimate. Reddit’s approach challenges this notion, advocating for the need for clear regulations and agreements.

TF Summary: What’s Next

Reddit’s ongoing efforts to block unauthorized scraping of its content reflect the growing tension between tech platforms and AI companies over data usage. As AI training speeds up, the demand for clear guidelines and regulations will become increasingly crucial. Reddit’s actions could influence how other platforms manage similar issues in the future, particularly as debates over data ownership and AI training practices continue to evolve.

— Text-to-Speech (TTS) provided by gspeech

What’s Happening & Why This Matters

TF Summary: What’s Next

Click here to cancel reply.

Related Stories

AI Behaving Badly: Sycophantic Bots, Erotic Chats, and Swaying Judgements

Google Compression Algorithm Reduces AI Memory Usage by 6x

Gemini Enables Importing Other AI’s Memories, Chats

Judge: Anthropic Is Not A National Security Threat

Following Meta Acquisition, Manus Founders Banned From Leaving China

SCOTUS Blocks Sony Music’s Piracy Strategy

iOS 26.4 Debuts Age Check for Over-18 U.K. Users

Social Media on Trial: Meta, YouTube Found Liable by Cali Jury

Quick Links

Company