Take the Dolly-15k instruction set. Instead of human-defined categories, we repartition the data into semantic categories. Training on these newly-discovered domains results in better evaluation performance.
Take the Dolly-15k instruction set. Instead of human-defined categories, we repartition the data into semantic categories. Training on these newly-discovered domains results in better evaluation performance.
⚠️ Human-defined domains miss semantic nuances
⚠️ Limited eval accessibility
⚠️ Poor scalability
Introducing 🎵R&B: first regroup data, then dynamically reweight domains during training!
⚠️ Human-defined domains miss semantic nuances
⚠️ Limited eval accessibility
⚠️ Poor scalability
Introducing 🎵R&B: first regroup data, then dynamically reweight domains during training!