Groq Unveils Lightning-fast LLM Engine; Developer Base Rockets Past 280K in 4 Months

Groq now allows you to make lightning fast queries and perform other tasks with leading large language models (LLMs) directly on its web site

The experience is significant because it demonstrates to developers and non-developers alike just how fast and flexible a LLM chatbot can be. Groq’s CEO Jonathan Ross says usage of LLMs will increase even more once people see how easy it is to use them on Groq’s fast engine. For example, the demo provides glimpses at what other tasks can be done easily at this speed, for example generating job postings or articles and changing them on the fly.

In one example, I asked it to critique the agenda of our VB Transform event about generative AI that starts tomorrow. It was almost instantaneous at providing feedback, including suggesting clearer categorization, more detailed session descriptions and better speaker profiles. When I asked for suggestions of great speakers to make the lineup more diverse, it immediately generated a list, with the organizations they are affiliated with – in a table format like I had suggested. I could change the table on the fly, adding a column for contact details, for example.

In a second exercise, I asked it to create a table of my speaking sessions next week, to help me organize. It not only created the tables I was asking for, but allowed me to easily make changes quickly, including spelling corrections. I could also change my mind and ask it to create extra columns for things I’d forgotten to ask for. It can translate it into different languages too. Sometimes it took me asking a couple of times for it to make a correction, but such bugs are generally at the LLM level, not the processing level. It certainly hints at the news sets of things LLMs can do when operating at this sort of speed.

Groq has gained attention because it promises it can do AI tasks much faster and more affordably than competitors, which it says is possible due to its language processing unit (LPU) that is much more efficient than GPUs at such tasks, in part because the LPU operates linearly. While GPUs are important for model training, when AI applications are actually deployed – “inference” refers to actions the model takes – they require more efficiency at less latency.

So far Groq has offered its service to power LLM workloads for free, and it’s gotten a massive uptake from developers, now at more than 282,000, Ross told VentureBeat. Groq launched its service 16 weeks ago.

Groq offers a console for developers to build their apps, similar to what other inference providers offer. Notably, though, Groq lets developers who build apps on OpenAI swap their app over to Groq in seconds, using some simple steps.

I spoke with Ross in preparation for a talk at VB Transform, where he’s one of the opening speakers tomorrow. The event is focused on enterprise generative AI in deployment, and Ross says he’s soon going to focus on the enterprise. Large companies are moving to deployment of AI applications, and require more efficient processing for their workloads, he said.

While you can type in your queries to the Groq engine, you can also now speak your queries after pushing a microphone icon. Groq uses the Whisper Large V3 model, the latest open source automatic speech recognition and speech translation model from OpenAI, to translate your voice into text. That text is then inserted as the prompt for the LLM.

Groq says its technology uses about a third of the power of a GPU at worst, but most of its workloads use as little as a tenth of the power. In a world where it seems like those LLM workloads will never stop scaling, and energy demand will just keep growing, Groq’s efficiency represents a challenge to the GPU-dominated compute landscape.

In fact, Ross claims that by next year, over half of the globe’s inference computing will be running on their chips. Ross will have the answers and a lot more at VentureBeat’s Transform 2024.


TAGGED: , , , ,
Share This Article
Leave a comment