Michelle Hawley
banner
michellehawley.bsky.social
Michelle Hawley
@michellehawley.bsky.social
Editorial Director at Simpler Media Group, managing VKTR.com.
@vktrnow.bsky.social contributor Scott Clark breaks it all down: www.vktr.com/ai-market/me...

☝️If you’re tracking the power plays in GenAI, this one’s worth a read.
Meta’s Llama 4 Models Spark Momentum — and Scrutiny — for Open-Source AI
Meta’s Llama 4 aims to close the gap with GPT-4 and Claude — but benchmark questions and transparency concerns could complicate adoption.
www.vktr.com
April 30, 2025 at 8:55 PM
Do we care more about performance or transparency?

Meta’s gunning for OpenAI, Anthropic, Gemini — and making a solid case. But can open-weight models win trust AND benchmarks?
April 30, 2025 at 8:55 PM
The open-source crowd is excited, but not everyone's convinced on the benchmarks.

And with the recent controversies around misleading benchmarks (more info on that here: vktr.com/ai-market/th...), they have every right to be.
April 30, 2025 at 8:55 PM
There’s Behemoth (2 trillion params), Scout (can handle entire books) and Maverick (for fast enterprise tasks).

Not subtle. Not boring. Not fully transparent either.
April 30, 2025 at 8:55 PM
With CoreWeave’s reliance on Microsoft & NVIDIA (and its substantial debt) many worried about long-term sustainability. The debt structure and dependence on a limited customer base bring some pretty big risks.
March 28, 2025 at 5:05 PM
According to Google's shared benchmarks, Gemini 2 performs better than OpenAI's GPT-4.5, Claude 3.7 Sonnet, Grok 3 Beta and DeepSeek R1 in areas like:
🔹 Reasoning & knowledge
🔹 Code editing
🔹 Visual reasoning
🔹 Imagine understanding
🔹 Long context
🔹 Multilingual performance
March 28, 2025 at 4:52 PM
The updates compared to Gemini 2:
🔹Lower latency
🔹Greater Workspace integration
🔹Improved reasoning

For anyone that was still on the fence with Gemini 2, this update might sway you toward giving it a try.
March 28, 2025 at 4:52 PM
If you're planning to attend, let me know which sessions you're looking forward to the most.

#AdobeSummit #AI #DigitalExperience
March 12, 2025 at 9:03 PM
How are you evaluating models? Do you stick with the big, well-known names, or do you experiment with up-and-comers? Let's talk. #TechTalk #AI #MachineLearning
February 21, 2025 at 3:12 PM
*LLM-as-a-Judge* is a cool concept & growing, but LLMs can overemphasize little details and miss what really matters.

*For cost*, something that looks cheap can cost you more in the long run when it comes to high-volume tasks. Some companies now split tasks between different models.
February 21, 2025 at 3:12 PM
Here's a few things to know:

*Automated scores* (MMLU, ROGUE, BLEU) don't guarantee real-world performance. These tests can still struggle with reasoning, accuracy & bias.

*Manual evaluation* is good at catching bias & nuance, but it's very hard to scale.
February 21, 2025 at 3:12 PM
What do you think? Is single vendor the way to go? Or is it better to curate a selection of tools?
February 19, 2025 at 5:24 PM
The plan is to snatch up a bunch of companies in an effort to create the "ultimate" data stack. But I'm pretty skeptical that that exists.

Instead of making life easier, I think this will mean more integrations (AKA headaches) for CMOs & CDOs.
February 19, 2025 at 5:24 PM