Reddit Blocking Chatbot Scraping, Training from Posts

Adam Carter

Reddit is taking a firm stand against unauthorized scraping of its content by AI companies, particularly targeting tech giants like Microsoft. The popular social platform, known for its vast amount of user-generated content, refuses to allow companies to freely use its data for training AI models without proper permission. Steve Huffman, Reddit’s CEO, recently spoke out about the platform’s position, expressing frustration over the unapproved use of Reddit’s data.

What’s Happening & Why This Matters

In recent months, Reddit has intensified its efforts to control how AI firms utilize its data. After discovering that Microsoft had been scraping Reddit’s content without notifying the platform, Reddit moved quickly to block this activity. Microsoft not only used the data for AI features in Bing but also resold it to other search engines via Bing’s APIs, a practice that Reddit found particularly problematic.

Huffman described the process of blocking Microsoft’s scraping efforts as challenging and time-consuming. Reddit has also taken similar actions against other AI firms like Perplexity AI and Anthropic, which attempted to access Reddit data without securing proper agreements.

These actions reflect Reddit’s strategy to maintain control over how its content is used, especially in the context of AI training. Earlier this year, Reddit secured a $60 million licensing deal with Google and another agreement with OpenAI. However, individual Reddit users whose posts are used do not receive any direct benefit; Reddit retains control over the data and its usage.

6 common training model challenges. credit: Oracle

Reddit’s stance has sparked a wider discussion about what qualifies as “publicly available” data and whether tech companies have the right to scrape and use this data without explicit permission. Companies like Apple and Salesforce have previously claimed that their use of publicly available data for AI training is legitimate. Reddit’s approach challenges this notion, advocating for the need for clear regulations and agreements.

TF Summary: What’s Next

Reddit’s ongoing efforts to block unauthorized scraping of its content reflect the growing tension between tech platforms and AI companies over data usage. As AI training speeds up, the demand for clear guidelines and regulations will become increasingly crucial. Reddit’s actions could influence how other platforms manage similar issues in the future, particularly as debates over data ownership and AI training practices continue to evolve.

— Text-to-Speech (TTS) provided by gspeech

Share This Article
Avatar photo
By Adam Carter “TF Enthusiast”
Background:
Adam Carter is a staff writer for TechFyle's TF Sources. He's crafted as a tech enthusiast with a background in engineering and journalism, blending technical know-how with a flair for communication. Adam holds a degree in Electrical Engineering and has worked in various tech startups, giving him first-hand experience with the latest gadgets and technologies. Transitioning into tech journalism, he developed a knack for breaking down complex tech concepts into understandable insights for a broader audience.
Leave a comment