The high cost of training data is making advanced AI systems inaccessible to all but the wealthiest tech companies. When it comes to training AI models, the quantity and quality of data matters more than the model’s design, architecture, or other characteristics. Models trained on more data perform better, as evidenced by the case of Meta’s Llama 3 outperforming AI2’s OLMo model due to training on a significantly larger dataset.

AI models must have human-annotated data to learn associations between labels and other observed data characteristics. However, the emphasis on large, high-quality datasets favors tech giants with big budgets who then lock up this data, stifling others from catching up.

Acquiring these large datasets often involves unethical or illegal behavior, leading to many questions about copyright infringement and legal reprisals. Even the more transparent deals are not fostering an equal and open AI ecosystem, with users not sharing in the revenue from data licensing‏. Since the cost of AI training data is expected to rise from $2.5 billion to almost $30 billion in a decade, many data brokers are exploiting this growing demand to the detriment of the wider AI research community.

[TF Opinion]High Cost of AI Training Data Limits Access to Big Tech Only

Click here to cancel reply.

Related Stories

AST SpaceMobile Delays Commercial Service to 2027

Lawsuit Claims Meta’s AI Targeted Employees With Medical Conditions for Layoffs

Europe’s First Fully Autonomous Pharmacy Just Opened in Lisbon

SpaceX Shares Fall Below Their IPO Price

New York is the First State to Impose a Data Centre Moratorium

White House Launches Gold Eagle — an AI Cybersecurity Clearinghouse

Google Images Turns 25 — and Gets AI Image Generation as a Birthday Gift

AI for Good Summit: Countries Clash Over Who Governs AI Globally

Quick Links

Company