With low-threshold tuning, we take Llama3-70B from:
➡️ 51% → 87% correctness
➡️ Retaining 53% of the original completeness
With low-threshold tuning, we take Llama3-70B from:
➡️ 51% → 87% correctness
➡️ Retaining 53% of the original completeness
We introduce a threshold that tunes how eagerly the model should respond:
Low threshold = more reliable answers 🔒 (Left box)
High threshold = more detailed answers 📝(Right box)
We introduce a threshold that tunes how eagerly the model should respond:
Low threshold = more reliable answers 🔒 (Left box)
High threshold = more detailed answers 📝(Right box)
1️⃣ Break pretrained LLM responses into factual fragments
2️⃣ Use ground truth to flag incorrect fragments
3️⃣ Modify finetuning responses by removing or replacing errors with “Unsure from here” 🚧
1️⃣ Break pretrained LLM responses into factual fragments
2️⃣ Use ground truth to flag incorrect fragments
3️⃣ Modify finetuning responses by removing or replacing errors with “Unsure from here” 🚧
HALT finetuning teaches LLMs to only generate content they’re confident is correct.
🔍 Insight: Post-training must be adjusted to the model’s capabilities.
⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝
🧵
HALT finetuning teaches LLMs to only generate content they’re confident is correct.
🔍 Insight: Post-training must be adjusted to the model’s capabilities.
⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝
🧵