3 months back: Llama 8B running at 750 Tokens/sec
Now: Llama 70B model running at 3,200 Tokens/sec
We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉
cnb.cx/4nG7Pcm #AI
cnb.cx/4nG7Pcm #AI
Hiring, re-organizing, calendar clean up (across the team), preparation for meetings (internal and external), etc. Half my day is available for whatever I find important - because the other half is spent freeing up time.
Hiring, re-organizing, calendar clean up (across the team), preparation for meetings (internal and external), etc. Half my day is available for whatever I find important - because the other half is spent freeing up time.
- GPT-OSS, Llama [US]: optimized for cheaper inference
- R1, Kimi K2, Qwen [China]: optimized for cheaper training
With China's population reducing inference costs is more important, and that means more training.
- GPT-OSS, Llama [US]: optimized for cheaper inference
- R1, Kimi K2, Qwen [China]: optimized for cheaper training
With China's population reducing inference costs is more important, and that means more training.
When I was 18, I had two dreams:
1) Be an astronaut
2) Build AI chips
I didn’t give up on one of them. 😀
When I was 18, I had two dreams:
1) Be an astronaut
2) Build AI chips
I didn’t give up on one of them. 😀
Learn more: groq.com/mistral-saba...
Learn more: groq.com/mistral-saba...
This is the interview after we just launched 19,000 LPUs in Saudi Arabia. We built the largest inference cluster in the region.
Link to the interview in the comments below!
This is the interview after we just launched 19,000 LPUs in Saudi Arabia. We built the largest inference cluster in the region.
Link to the interview in the comments below!
Build fast.
Build fast.
Yes. It's called Jevons Paradox and it's a big part of our business thesis.
In the 1860s, an Englishman wrote a treatise on coal where he noted that every time steam engines got more efficient people bought more coal.
🧵(1/5)
Yes. It's called Jevons Paradox and it's a big part of our business thesis.
In the 1860s, an Englishman wrote a treatise on coal where he noted that every time steam engines got more efficient people bought more coal.
🧵(1/5)
OpenAI, Anthropic, and Azure are the top 3 LLM API providers on LangChain
Groq is #4, and close behind Azure
Google, Amazon, Mistral, and Hugging Face are the next 4.
Ollama is for local development.
Now add three more 747's worth of LPUs 😁
OpenAI, Anthropic, and Azure are the top 3 LLM API providers on LangChain
Groq is #4, and close behind Azure
Google, Amazon, Mistral, and Hugging Face are the next 4.
Ollama is for local development.
Now add three more 747's worth of LPUs 😁
Groq's second B747 this week. How many LPUs and GroqRacks can we load into a jumbo jet? Take a look.
Have you been naughty or nice?
Groq's second B747 this week. How many LPUs and GroqRacks can we load into a jumbo jet? Take a look.
Have you been naughty or nice?
techcrunch.com/2024/12/06/m...
techcrunch.com/2024/12/06/m...
There was this guy who got in a lot of trouble once, his name was Galileo.
There was this guy who got in a lot of trouble once, his name was Galileo.
One side says its 25 million, because we're going to get to 25 million tokens per second by the end of the year
On the other side, it says, “Make it real. Make it now. Make it wow.”
One side says its 25 million, because we're going to get to 25 million tokens per second by the end of the year
On the other side, it says, “Make it real. Make it now. Make it wow.”
Fast prompt generation with Groq ✅
Fast image generation with Fal.ai ✅
Open Source (MIT) ✅
⚙️ pip install pyimagen
Fast prompt generation with Groq ✅
Fast image generation with Fal.ai ✅
Open Source (MIT) ✅
⚙️ pip install pyimagen
3 months back: Llama 8B running at 750 Tokens/sec
Now: Llama 70B model running at 3,200 Tokens/sec
We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉
3 months back: Llama 8B running at 750 Tokens/sec
Now: Llama 70B model running at 3,200 Tokens/sec
We're still going to get a liiiiiiitle bit faster, but this is our V1 14nm LPU - how fast will V2 be? 😉