Daniel
banner
daniel-davia.bsky.social
Daniel
@daniel-davia.bsky.social
Professionally talking about stuff. Mostly AI, revenue, and growth. Also enjoying life.
My personal #ai highlight paper this year so far: arxiv.org/abs/2505.21397
Focusing on #reliability in #llm execution with #symbolic #reasoning resulting in +30% improvements in accuracy while reducing the hard failing requests. It's generally a big win, but it will be extremely hard to adopt
DecisionFlow: Advancing Large Language Model as Principled Decision Maker
In high-stakes domains such as healthcare and finance, effective decision-making demands not just accurate outcomes but transparent and explainable reasoning. However, current language models often la...
arxiv.org
June 12, 2025 at 7:55 AM
Microsoft opensourcing github copilot is probably the most fun move in the AI code editor game. #Microsoft #opensource #copilot #ai
May 20, 2025 at 7:12 PM
I think the biggest winner with all the new models is groq.com. They provide the best new models in an absurd speed, both in release and in speed, while finally having the tool use fixed. #opensource #llm #ai
Groq is Fast AI Inference
The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and energy efficiency. Groq provides cloud and on-prem solutions at scale for AI...
groq.com
April 29, 2025 at 6:10 PM
Reposted by Daniel
MarkItDown now offers an #MCP (Model Context Protocol) server for integration with #LLM applications like Claude Desktop. See github.com/microsoft/ma... for more information.

New to MarkItDown? It is a #Python tool for converting files and #office documents to #Markdown.
markitdown/packages/markitdown-mcp at main · microsoft/markitdown · GitHub
Python tool for converting files and office documents to Markdown. - microsoft/markitdown
github.com
April 20, 2025 at 7:37 AM
Sadly, as many people expected, #Meta was helping their #llama #llama4 #benchmarks. While it's a great model, I feel like there will be more bad news regarding benchmarks and real-world performance. www.theverge.com/meta/645012/...
Meta got caught gaming AI benchmarks
Meta released Llama 4, but the announcement was eclipsed by benchmark drama.
www.theverge.com
April 8, 2025 at 1:47 PM
Am I the only one who is confused by the automatic opt-in for #training on chat data in #OpenAi? Last I remember, they made a big PR stunt telling that they don't do that anymore. #AI #Privacy #EU #AI-ACT
April 8, 2025 at 9:01 AM
Old, but it's still amazing how many tokens you can reduce with this technique. medium.com/@parasmadan....
I think it's nice to keep these strategies in mind while there are new concepts every day. I like the combination with #TextGrad for some crazy performance increases #AI #LLM #performance
Reducing GPT-4 API Cost by reducing Prompt Decompression
To reduce the size of a prompt, you can use compression techniques. One way to do this is by using GPT’s ability to compress and decompress tokens. A recent tweet from @VictorTaelin suggests that GPT…
medium.com
April 7, 2025 at 7:40 PM
The stats of LLaMA 4 are looking great, but honestly? I just enjoy the tone of voice. Super underrated feature — not like talking to the emotional equivalent of a 3-day-old corpse (still love you, Gemini).
#LLaMA4 #AI #GeminiAI
April 6, 2025 at 6:03 PM
Reposted by Daniel
Can better architectures & representations make self-play enough for zero-shot coordination? 🤔
We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! 🚀 #ICLR2025 1/n
April 4, 2025 at 5:12 PM
One of the most interesting papers for the future of #AI as #reliable #agents arxiv.org/html/2503.13...
Providing first thoughts on structuring larger agentic systems and scale operations in a trustworthy manner
April 4, 2025 at 2:47 PM