Lightnews — Scholar-powered news

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

🎉Read the preprint: agentrxiv.github.io
Try out AgentRxiv: github.com/SamuelSchmid...
Let’s explore how agents can accelerate research—together.
🧵8/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

🛡️Research agents and their labs, while promising, are still not at human-level quality. By channeling their work into AgentRxiv—a dedicated hub for autonomous research—we’re also safeguarding the quality of human research on arXiv.
🧵7/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

✨ In parallel experiments with 3 independent labs sharing pre-prints through AgentRxiv, the best method achieved 79.8% accuracy—a 13.7% relative improvement—while reaching key milestones faster than in sequential experiments.
🧵6/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

🏥 We also wondered how well the methods our agents discovered perform on out-of-domain benchmarks (MMLU-Pro, GPQA, & MedQA) and with five other language models. We find the top performing algorithm SDA improves across these benchmarks on average by 3.3%.
🧵5/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

🥇We perform experiments where agents are asked to develop new reasoning techniques on MATH-500. We find that when agents are given access to previous research, accuracy improved from 70.2% to 78.2% – an 11.4% relative improvement over the gpt-4o mini baseline and 9.7% over gpt-4o mini with CoT.
🧵4/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

To address this, we introduce AgentRxiv—a framework that lets LLM agent laboratories upload and retrieve reports from a shared preprint server in order to collaborate, share insights, and iteratively build on each other’s research.
🧵3/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

There has been a lot of recent excitement around autonomous LLM agents performing research, with several fully autonomous works being accepted into ICLR 2025 📚

‼️The problem is that these systems work in isolation without the ability to build on their research.
🧵2/8

March 24, 2025 at 2:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

👩‍💻 All of the code is completely open-source! Below are links to the website, paper, and github! Check it out.

website: agentlaboratory.github.io
paper: arxiv.org/pdf/2501.04227
github: github.com/SamuelSchmidga…

Agent Laboratory: Using LLMs as Research Assistants

by Samuel Schmidgall at JHU

agentlaboratory.github.io

February 27, 2025 at 5:25 PM

samuelschmidgall.bsky.social

@samuelschmidgall.bsky.social

Agent Laboratory consists of three primary phases that guide the research process: (1) Literature Review, (2) Experimentation, and (3) Report Writing. During each phase, LLM agents collaborative, integrating tools like arXiv, Hugging Face, Python, and LaTeX.

February 27, 2025 at 5:25 PM

Reposted

Adam Kucharski

@adamjkucharski.bsky.social

These aren’t totally hypothetical questions. Currently, the US is in the process of trashing its wildly successful science funding system. NIH, which funds tens of billions of dollars of research each year, has been estimated to generate around $2.50 of economic activity for every $1 funded: