Lightnews — Scholar-powered news

Daniel

@daniel-davia.bsky.social

My personal #ai highlight paper this year so far: arxiv.org/abs/2505.21397
Focusing on #reliability in #llm execution with #symbolic #reasoning resulting in +30% improvements in accuracy while reducing the hard failing requests. It's generally a big win, but it will be extremely hard to adopt

DecisionFlow: Advancing Large Language Model as Principled Decision Maker

In high-stakes domains such as healthcare and finance, effective decision-making demands not just accurate outcomes but transparent and explainable reasoning. However, current language models often la...

arxiv.org

June 12, 2025 at 7:55 AM

Daniel

@daniel-davia.bsky.social

Microsoft opensourcing github copilot is probably the most fun move in the AI code editor game. #Microsoft #opensource #copilot #ai

May 20, 2025 at 7:12 PM

Daniel

@daniel-davia.bsky.social

I think the biggest winner with all the new models is groq.com. They provide the best new models in an absurd speed, both in release and in speed, while finally having the tool use fixed. #opensource #llm #ai

Groq is Fast AI Inference

The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and energy efficiency. Groq provides cloud and on-prem solutions at scale for AI...

groq.com

April 29, 2025 at 6:10 PM

Reposted by Daniel

Stefan Stranger

@sstranger.bsky.social

MarkItDown now offers an #MCP (Model Context Protocol) server for integration with #LLM applications like Claude Desktop. See github.com/microsoft/ma... for more information.

New to MarkItDown? It is a #Python tool for converting files and #office documents to #Markdown.

markitdown/packages/markitdown-mcp at main · microsoft/markitdown · GitHub

Python tool for converting files and office documents to Markdown. - microsoft/markitdown

github.com

April 20, 2025 at 7:37 AM

Daniel

@daniel-davia.bsky.social

Sadly, as many people expected, #Meta was helping their #llama #llama4 #benchmarks. While it's a great model, I feel like there will be more bad news regarding benchmarks and real-world performance. www.theverge.com/meta/645012/...

Meta got caught gaming AI benchmarks

Meta released Llama 4, but the announcement was eclipsed by benchmark drama.

www.theverge.com

April 8, 2025 at 1:47 PM

Daniel

@daniel-davia.bsky.social

#MCP as a Concept is Amazing—However, 90% of Its Implementations Are Trash

#AI #LLM #☠️
www.linkedin.com/posts/daniel...

MCP as a Concept is Amazing—However, 90% of Its Implementations Are Trash… | Daniel Albrecht

MCP as a Concept is Amazing—However, 90% of Its Implementations Are Trash Model Context Protocol (MCP) servers act as converters between general tools (ERP, analytics tools, or others) and AI communi...

www.linkedin.com

April 8, 2025 at 11:39 AM

Daniel

@daniel-davia.bsky.social

Am I the only one who is confused by the automatic opt-in for #training on chat data in #OpenAi? Last I remember, they made a big PR stunt telling that they don't do that anymore. #AI #Privacy #EU #AI-ACT

April 8, 2025 at 9:01 AM

Daniel

@daniel-davia.bsky.social

Old, but it's still amazing how many tokens you can reduce with this technique. medium.com/@parasmadan....
I think it's nice to keep these strategies in mind while there are new concepts every day. I like the combination with #TextGrad for some crazy performance increases #AI #LLM #performance

Reducing GPT-4 API Cost by reducing Prompt Decompression

To reduce the size of a prompt, you can use compression techniques. One way to do this is by using GPT’s ability to compress and decompress tokens. A recent tweet from @VictorTaelin suggests that GPT…

medium.com

April 7, 2025 at 7:40 PM

Daniel

@daniel-davia.bsky.social

The stats of LLaMA 4 are looking great, but honestly? I just enjoy the tone of voice. Super underrated feature — not like talking to the emotional equivalent of a 3-day-old corpse (still love you, Gemini).
#LLaMA4 #AI #GeminiAI

April 6, 2025 at 6:03 PM

Reposted by Daniel

Sarath Chandar

@sarath-chandar.bsky.social

Can better architectures & representations make self-play enough for zero-shot coordination? 🤔
We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! 🚀 #ICLR2025 1/n

April 4, 2025 at 5:12 PM

Daniel

@daniel-davia.bsky.social

One of the most interesting papers for the future of #AI as #reliable #agents arxiv.org/html/2503.13...
Providing first thoughts on structuring larger agentic systems and scale operations in a trustworthy manner

April 4, 2025 at 2:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news