Authors Sue Apple for 'Pirated' AI Training Materials

Apple sits on the edge of a legal hurricane as authors accuse the iPhone maker of using pirated books to train its artificial intelligence models. In a lawsuit filed in the Northern District of California, well-known authors Grady Hendrix and Jennifer Roberson claim Apple exploited their copyrighted works without consent or payment.

What’s Happening & Why This Matters

The case centers on Books3, a notorious dataset containing over 196,000 pirated books. According to the lawsuit, Apple tapped into this collection to develop its OpenELM language models, part of the company’s growing Apple Intelligence initiative. Furthermore, the complaint suggests Apple also used the dataset to train its Foundation Language Models. This development deepens the scope of alleged misuse related to AI training with such pirated materials.

Books3 was removed in 2023 after Denmark’s Rights Alliance filed a DMCA takedown request. However, this was not before several tech giants reportedly leveraged it for AI development. Companies like Meta have previously been linked to the dataset, adding weight to the claims against Apple.

The authors have petitioned the court to certify their complaint as a class action. This move could potentially bring hundreds or even thousands of other writers into the case. They are demanding that Apple be stopped from future unauthorized use of copyrighted works and seek financial compensation for damages. It’s crucial, as they argue, that military practices of pirated books and AI training should cease.

This case mirrors a similar lawsuit against Anthropic, creator of the Claude AI model. That lawsuit concluded with Anthropic agreeing to a record $1.5 billion settlement. The settlement translates to roughly $3,000 per infringed book, covering over 500,000 titles. If found guilty, the authors’ case may establish a legal-tech precedent.

Meanwhile, OpenAI and Perplexity are also battling similar lawsuits. The other lawsuits infer a reckoning for AI innovators regarding how training data is sourced and managed, especially when it suggests skirting ethics.

TF Summary: What’s Next

The lawsuit against Apple could reshape how major tech companies approach AI model training. If the court rules in favor of the authors, Apple may face billions in damages. Furthermore, it may force an overhaul in data collection practices. This training case intensifies scrutiny on other AI leaders, including OpenAI and Meta. Outrage and lawsuits press the entire sector towards stricter compliance with copyright laws.

FORECAST: The outcome may dictate whether AI innovation flourishes under current practices or requires a shift to more transparent, ethical methods. As the matter unfolds, the case spotlights pirated materials, AI training, and the ethical implications.

— Text-to-Speech (TTS) provided by gspeech