neulab.github.io/Pangea/
I’ll be at the Foundation Models for Science conference at Simons Foundation, NYC next week, then heading to NAACL (more details soon).
Let’s catch up if you’re around!✨
neulab.github.io/Pangea/
I’ll be at the Foundation Models for Science conference at Simons Foundation, NYC next week, then heading to NAACL (more details soon).
Let’s catch up if you’re around!✨
TLDR: We demonstrated scaling retrieval corpora of Retrieval-Augmented LMs to 1.4T helps & achieves more compute-optimal scaling
Details: retrievalscaling.github.io
TLDR: We demonstrated scaling retrieval corpora of Retrieval-Augmented LMs to 1.4T helps & achieves more compute-optimal scaling
Details: retrievalscaling.github.io
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵
We further conduct expert evaluations with scientists across CS, Bio and Physics, comparing OS against expert answers.
Scientists preferred OpenScholar-8B outputs compared to human-written answers in majority of the times, thanks to its coverage
We further conduct expert evaluations with scientists across CS, Bio and Physics, comparing OS against expert answers.
Scientists preferred OpenScholar-8B outputs compared to human-written answers in majority of the times, thanks to its coverage
A benchmark for evaluating scientific language models on real-world, open-ended questions requiring synthesis across multiple papers. 🌟
📚 7 datasets across four scientific disciplines
🧑🔬 2,000+ expert-annotated question and 200 answers
📊 Automated metrics
A benchmark for evaluating scientific language models on real-world, open-ended questions requiring synthesis across multiple papers. 🌟
📚 7 datasets across four scientific disciplines
🧑🔬 2,000+ expert-annotated question and 200 answers
📊 Automated metrics
It's a retrieval-augmented LM with
1️⃣ a datastore of 45M+ open-access papers
2️⃣ a specialized retriever and reranker to search the datastore
3️⃣ an 8B Llama fine-tuned LM trained on high-quality synthetic data
4️⃣ a self-feedback generation pipeline
It's a retrieval-augmented LM with
1️⃣ a datastore of 45M+ open-access papers
2️⃣ a specialized retriever and reranker to search the datastore
3️⃣ an 8B Llama fine-tuned LM trained on high-quality synthetic data
4️⃣ a self-feedback generation pipeline
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai