I write (sparsely) at: vivekkalyan.com
See some demos of open-source repos here (no sign-up required):
cartograph.app/demo
(Reply here if you'd like to see others added)
See some demos of open-source repos here (no sign-up required):
cartograph.app/demo
(Reply here if you'd like to see others added)
- Adding safety data didn't negatively affect other skills. - Measured the removal of math data on GSM8K and MATH, confirming that the scores drop.
- Measured the the effectiveness of WildChat (diverse chat data) and Persona (skill based data)
- Adding safety data didn't negatively affect other skills. - Measured the removal of math data on GSM8K and MATH, confirming that the scores drop.
- Measured the the effectiveness of WildChat (diverse chat data) and Persona (skill based data)
- n-gram matching works better than embedding methods in terms of precision
- 8-gram matching only on user-turns (since completions are LLM generated)
- remove training sets that have more than 2% overlap with any evaluation set
- n-gram matching works better than embedding methods in terms of precision
- 8-gram matching only on user-turns (since completions are LLM generated)
- remove training sets that have more than 2% overlap with any evaluation set