OpenAI Unveils GPT-4o mini — a Smaller, Much cheaper Multimodal AI Model

A little more than two months ago, OpenAI released GPT-4o, its newest and most powerful AI model that was the first to be trained by the company natively to handle multimodal inputs and outputs (text, image, audio, and ultimately video) without linking to other models for help.

It was the most powerful, publicly available AI model in the world on third-party benchmarks upon release, but was outclassed shortly after by rival Anthropic’s Claude 3.5 Sonnet a few weeks later, and the two have been neck-and-neck ever since.

But OpenAI isn’t stopping there: today, it is announcing a smaller version of that model, GPT-4o mini, which it says is “the most cost-efficient small model in the market,” costing developers just $0.15 USD per 1 million tokens a user inputs, and $0.60 for every million they receive back from the model, for third-party apps and services built atop it using OpenAI’s application programming interfaces (APIs).

It’s also far cheaper than GPT-4o, which costs $5.00 for 1 million input tokens and $15 per 1 million output tokens.

Tokens, as you’ll recall, are the numerical codes that represent semantic units, words, numbers, and other data inside a given large language model (LLM) or small language model (SML) — the latter which GPT-4o mini appears to be (OpenAI did not release the number of parameters, or connections between artificial neurons, the model has, making it difficult to say how large or small it is, but the “mini” name clearly gives an indication.)

Olivier Godement, OpenAI’s Head of Product, API, told VentureBeat in a teleconference interview yesterday that GPT-4o mini is particularly helpful for enterprises, startups and developers “building any agent” from “a customer support agent” to “a financial agent,” as those typically perform “many calls back to the API,” resulting in a high volume of tokens inputted and outputted by the underlying source model, which can quickly drive up costs.

“The cost per intelligence is so good, I expect it’s going to be used for all sorts of customer support, software engineering, creative writing, all kinds of tasks,” said Godement. “Every time we adopt a new model, there are new cases that pop up, and I think that will be even more the case for GPT-4o mini.”

The move to launch GPT-4o mini also comes ahead of Meta’s reported release of its massive Llama 3 400-billion parameter model expected next week, and seems quite clearly designed to pre-empt that news and cement in developers’ minds that OpenAI remains the leader in enterprise-grade AI.

60% cheaper than GPT-3.5 Turbo for developers

To put GPT-4o mini’s cost into perspective, it’s 60% less than GPT-3.5 Turbo, previously the most affordable model among OpenAI’s offerings since the release of GPT-4o.

At the same time, the model is targeted to be as fast at working as GPT-3.5 Turbo, transmitting around 67 tokens per second.

OpenAI is pitching GPT-4o mini as a direct successor to GPT-3.5 Turbo, but a much more capable one, as it also can handle text and vision inputs, unlike GPT-3.5 Turbo, which could only handle text.

At some point in the future, OpenAI says GPT-4o mini will also be able to generate imagery and other multimodal outputs including audio and video, as well as accept them as inputs. But for now, only the text and still image/document inputs will be available today.

At present, GPT-4o mini outperforms GPT-3.5 Turbo on a range of third-party benchmarks, other comparably classed models such as Google’s Gemini 1.5 Flash and Anthropic’s Claude 3 Haiku, and even GPT-4 itself on some tasks.

Specifically, OpenAI released benchmarks showing that GPT-4o mini scores 82.0% on the Massive Multitask Language Understanding (MMLU) benchmark, which includes multiple choice questions about subjects from math, science, history, and more, versus 77.9% for Gemini Flash and 73.8% for Claude Haiku.

Coming to Apple devices this fall as well

In addition, Godement told VentureBeat that GPT-4o mini would be available this fall through Apple Intelligence, the new AI service from Apple Inc., for its mobile devices and Mac desktops, timed to coincide with the release of its new iOS 18 software, as part of the partnership between OpenAI and Apple announced at the latter’s WWDC event last month.

However, the model will still be running on OpenAI cloud servers — not on device, which would seem to negate one of the advantages of running a small model in the first place, a local inference that is by nature, faster, more secure, and doesn’t require a web connection.

Yet Godement pointed out that even when connecting to OpenAI cloud servers, the GPT-4o mini model is faster than others available from the company. Moreover, he told VentureBeat that most third-party developers OpenAI worked with were not yet interested in running the company’s models locally, as it would require much more intensive setup and computing hardware on their end.

However, the introduction of GPT-4o mini raises the possibility that OpenAI developer customers may now be able to run the model locally more cost effectively and with less hardware, so Godement said it was not out of the question that such a solution could one day be provided.

Replacing GPT-3.5 Turbo in ChatGPT, but not killing it entirely for developers

Beginning later today, GPT-4o mini will replace GPT-3.5 Turbo among the options for paying subscribers of ChatGPT including the Plus and Teams plans — with support for ChatGPT Enterprise coming next week. The model will appear in the drop-down menu on the upper left corner of the web and Mac desktop apps.

However, ChatGPT users won’t get a price reduction on their paid subscriptions for selecting GPT-4o mini — only developers building atop the API will benefit from the savings.

Yet ChatGPT users will have access to a newer, faster, and more powerful model for tasks than GPT-3.5 Turbo automatically, which is certainly a benefit.

OpenAI isn’t yet deprecating or phasing out support for GPT-3.5 Turbo in its APIs, as the company doesn’t want to force developers to upgrade or to break the apps that are currently built atop this older model.

Instead, the company believes that developers will likely naturally migrate quickly en masse to using the new model since it is a significant cost reduction and boost in intelligence and other capabilities.

Some developers have already been alpha testing GPT-4o mini, according to Godement, including enterprise expense management and accounts software startup Ramp and the cloud email AI startup Superhuman, and both are said to have reported excellent results.

Godement said GPT-4o mini is powering Ramp’s automatic receipt categorization and merchant detection features, and powering Superhuman’s suggested, custom-tailored email responses.

Ramp in particular has “seen pretty amazing results for its data extraction tests,” from receipts, said Godement.

He was not able to say precisely whether Ramp was using the GPT-4o mini native multimodal vision input or if the firm was using another system to first extract text and numerals from receipts and send it to the model.

So why would any developers still use the older, more expensive GPT-4o parent model?

Given the significant cost savings offered by GPT-4o mini and high performance benchmarks on a number of tasks and tests, the question naturally arises: why would a developer pay more money to use the full GPT-4o model when the mini one is now available?

OpenAI believes that for the most computationally-intensive, complex, and demanding applications, the full GPT-4o is still the way to go, and justifies its higher price in comparison.

Source: venturebeat.com