Jonathan Ross
banner
jonathan-ross.bsky.social
Jonathan Ross
@jonathan-ross.bsky.social
CEO + Founder @ Groq, the Most Popular API for Fast Inference | Creator of the TPU and LPU, Two of the World’s Most Important AI Chips | On a Mission to Double the World's AI Compute by 2027
Pinned
What can you do with Llama quality and Groq speed? Instant. That's what.

3 months back: Llama 8B running at 750 Tokens/sec
Now: Llama 70B model running at 3,200 Tokens/sec

We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉
Reposted by Jonathan Ross
Fantastic insight on the massive demand for AI inference infrastructure “The demand for AI compute is insatiable” @groq.com CEO @jonathan-ross.bsky.social, “Our mission is to provide over half of the world’s inference compute” - @cnbc.com

cnb.cx/4nG7Pcm #AI
Groq CEO: Our mission is to provide over half of the world’s inference compute
Jonathan Ross, CEO and founder of Groq, joins CNBC’s 'Squawk on the Street' to discuss the AI chip startup’s $750 million funding round, its push to deliver faster, lower-cost inference chips, and why...
cnb.cx
September 25, 2025 at 12:46 PM
Founder Tip #2: You have to spend time to make time.

Hiring, re-organizing, calendar clean up (across the team), preparation for meetings (internal and external), etc. Half my day is available for whatever I find important - because the other half is spent freeing up time.
September 6, 2025 at 4:52 PM
Clearly China doesn't have enough compute for scaled AI today:
- GPT-OSS, Llama [US]: optimized for cheaper inference
- R1, Kimi K2, Qwen [China]: optimized for cheaper training

With China's population reducing inference costs is more important, and that means more training.
August 19, 2025 at 12:19 PM
Reposted by Jonathan Ross
Transcribe audio with @groq.com.
April 16, 2025 at 2:13 PM
I spent the weekend hanging out with a group of friends. A question we asked was what dreams did we have that we gave up on?

When I was 18, I had two dreams:

1) Be an astronaut
2) Build AI chips

I didn’t give up on one of them. 😀
March 24, 2025 at 2:39 PM
Reposted by Jonathan Ross
Big news! Mistral AI Saba 24B is on GroqCloud! The specialized regional language model is perfect for Middle East and South Asia-based devs and enterprises building AI solutions that need fast inference.
Learn more: groq.com/mistral-saba...
Mistral Saba Added to GroqCloud™ Model Suite - Groq is Fast AI Inference
GroqCloud™ has added another openly-available model to our suite – Mistral Saba. Mistral Saba is Mistral AI’s first specialized regional language model,
hubs.la
February 27, 2025 at 5:04 PM
It was a pleasure being back on 20VC with Harry Stebbings. His craft of interviewing is second to none and we went deep.

This is the interview after we just launched 19,000 LPUs in Saudi Arabia. We built the largest inference cluster in the region.

Link to the interview in the comments below!
February 17, 2025 at 6:00 PM
We built the region’s largest inference cluster in Saudi Arabia in 51 days and we just announced a $1.5B agreement for Groq to expand our advanced LPU-based AI inference infrastructure.

Build fast.
February 9, 2025 at 10:42 PM
My emergency episode with @harrystebbings.bsky.social at 20VC just launched on the impact of #DeepSeek on the AI world
January 29, 2025 at 4:41 PM
Reposted by Jonathan Ross
Yesterday at the World Economic Forum in Davos, I joined a constructive discussion on AGI alongside @andrewyng.bsky.social, @yejinchoinka.bsky.social, @jonathan-ross.bsky.social , @thomwolf.bsky.social and moderator @nxthompson.bsky.social. Full discussion here: www.weforum.org/meetings/wor...
January 23, 2025 at 5:01 PM
January 13, 2025 at 4:11 PM
When you make compute cheaper do people buy more?

Yes. It's called Jevons Paradox and it's a big part of our business thesis.

In the 1860s, an Englishman wrote a treatise on coal where he noted that every time steam engines got more efficient people bought more coal.

🧵(1/5)
January 8, 2025 at 4:03 PM
This is insane, Groq is the #4 API on this list! 😮

OpenAI, Anthropic, and Azure are the top 3 LLM API providers on LangChain

Groq is #4, and close behind Azure

Google, Amazon, Mistral, and Hugging Face are the next 4.

Ollama is for local development.

Now add three more 747's worth of LPUs 😁
January 7, 2025 at 4:04 PM
Groq just got a shout out on the All-In pod as one of the big winners for 2025 alongside Nvidia. It’s the year of the AI chip and ours is the fastest 😃
January 5, 2025 at 12:09 AM
Welcome to Shipmas - Groq Style.

Groq's second B747 this week. How many LPUs and GroqRacks can we load into a jumbo jet? Take a look.

Have you been naughty or nice?
December 24, 2024 at 3:44 PM
Santa rented two full 747s this week to make his holiday deliveries of GroqRacks. Ho ho ho! 🎅
December 23, 2024 at 5:47 PM
(1/5) One of the reasons why chips are so hard to innovate in is because if you're asking someone to put up a 10 million, 100 million, or a billion dollar check they need to know that what they're buying is going to work.
December 10, 2024 at 3:44 PM
(1/5) "The reports of the LLM scaling laws' demise have been greatly exaggerated."

techcrunch.com/2024/12/06/m...
December 6, 2024 at 6:11 PM
(1/5) The question I get asked a lot is, “Should I be afraid of AI?”

There was this guy who got in a lot of trouble once, his name was Galileo.
November 28, 2024 at 3:43 PM
(1/5) Everyone at Groq has one of these challenge coins on them. It’s how we create alignment.

One side says its 25 million, because we're going to get to 25 million tokens per second by the end of the year

On the other side, it says, “Make it real. Make it now. Make it wow.”
November 26, 2024 at 3:29 PM
Reposted by Jonathan Ross
Image generation is just TOO MUCH FUN!

Fast prompt generation with Groq ✅
Fast image generation with Fal.ai
Open Source (MIT) ✅

⚙️ pip install pyimagen
November 23, 2024 at 7:28 PM
Wow. How did this happen, and how do we keep it happening?
November 23, 2024 at 4:57 PM
What can you do with Llama quality and Groq speed? Instant. That's what.

3 months back: Llama 8B running at 750 Tokens/sec
Now: Llama 70B model running at 3,200 Tokens/sec

We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉
November 23, 2024 at 4:18 AM