☝️If you’re tracking the power plays in GenAI, this one’s worth a read.
☝️If you’re tracking the power plays in GenAI, this one’s worth a read.
Meta’s gunning for OpenAI, Anthropic, Gemini — and making a solid case. But can open-weight models win trust AND benchmarks?
Meta’s gunning for OpenAI, Anthropic, Gemini — and making a solid case. But can open-weight models win trust AND benchmarks?
And with the recent controversies around misleading benchmarks (more info on that here: vktr.com/ai-market/th...), they have every right to be.
And with the recent controversies around misleading benchmarks (more info on that here: vktr.com/ai-market/th...), they have every right to be.
Not subtle. Not boring. Not fully transparent either.
Not subtle. Not boring. Not fully transparent either.
🔹 Reasoning & knowledge
🔹 Code editing
🔹 Visual reasoning
🔹 Imagine understanding
🔹 Long context
🔹 Multilingual performance
🔹 Reasoning & knowledge
🔹 Code editing
🔹 Visual reasoning
🔹 Imagine understanding
🔹 Long context
🔹 Multilingual performance
🔹Lower latency
🔹Greater Workspace integration
🔹Improved reasoning
For anyone that was still on the fence with Gemini 2, this update might sway you toward giving it a try.
🔹Lower latency
🔹Greater Workspace integration
🔹Improved reasoning
For anyone that was still on the fence with Gemini 2, this update might sway you toward giving it a try.
#AdobeSummit #AI #DigitalExperience
#AdobeSummit #AI #DigitalExperience
*For cost*, something that looks cheap can cost you more in the long run when it comes to high-volume tasks. Some companies now split tasks between different models.
*For cost*, something that looks cheap can cost you more in the long run when it comes to high-volume tasks. Some companies now split tasks between different models.
*Automated scores* (MMLU, ROGUE, BLEU) don't guarantee real-world performance. These tests can still struggle with reasoning, accuracy & bias.
*Manual evaluation* is good at catching bias & nuance, but it's very hard to scale.
*Automated scores* (MMLU, ROGUE, BLEU) don't guarantee real-world performance. These tests can still struggle with reasoning, accuracy & bias.
*Manual evaluation* is good at catching bias & nuance, but it's very hard to scale.
Instead of making life easier, I think this will mean more integrations (AKA headaches) for CMOs & CDOs.
Instead of making life easier, I think this will mean more integrations (AKA headaches) for CMOs & CDOs.