- Realistic argument completion: Llama-3.1-70B finds missing arguments only 18% of the time
- Case retrieval: Best method finds correct precedents in top-5 results just 31.4% of the time
Lots of room for improvement! 📈
- Realistic argument completion: Llama-3.1-70B finds missing arguments only 18% of the time
- Case retrieval: Best method finds correct precedents in top-5 results just 31.4% of the time
Lots of room for improvement! 📈
🤖 GPT-4o: 4.3/5 avg. LLM-as-judge rating for both arg. summ. & comp.
🤵 Lawyers: 4.0/5 (summ.) and 3.9/5 (comp.) avg. rating
LLMs excel at summarization and guided completion tasks, requiring only minor edits.
🤖 GPT-4o: 4.3/5 avg. LLM-as-judge rating for both arg. summ. & comp.
🤵 Lawyers: 4.0/5 (summ.) and 3.9/5 (comp.) avg. rating
LLMs excel at summarization and guided completion tasks, requiring only minor edits.
We introduce the first benchmark specifically designed to help LLMs assist lawyers in writing legal briefs 🧑⚖️
📄 arxiv.org/abs/2506.06619
🗂️ huggingface.co/datasets/jw4...
We introduce the first benchmark specifically designed to help LLMs assist lawyers in writing legal briefs 🧑⚖️
📄 arxiv.org/abs/2506.06619
🗂️ huggingface.co/datasets/jw4...