We also repeated the distillation process multiple times and found that the performance was maintained
We also repeated the distillation process multiple times and found that the performance was maintained
We first train a model on the GPQA test data, which obviously made this model achieve 100% performance. But hey, don’t many LLMs train on test data anyway?🙈
Then, we train a new model on another (fair) data, but with a distillation loss from the cheating model
We first train a model on the GPQA test data, which obviously made this model achieve 100% performance. But hey, don’t many LLMs train on test data anyway?🙈
Then, we train a new model on another (fair) data, but with a distillation loss from the cheating model
We managed to achieve ~75% on a challenging GPQA with only 2 layers of transformers(~ 40M params) that were trained on different data; in our case, MedMCQA.
Introducing...
We managed to achieve ~75% on a challenging GPQA with only 2 layers of transformers(~ 40M params) that were trained on different data; in our case, MedMCQA.
Introducing...
I’ll be using this platform, mainly cross-posting from X and other places
Kicking things off by promoting (to my nonexistent audience 😂) CVQA at NeurIPS!
Oral:
📍 East Meeting Room 1-3
🗓️ Thu, 12 Dec 3:30 pm PST
Poster:
📍 West Ballroom A-D #5110
🗓️ Thu, 12 Dec 4:30 pm PST
I’ll be using this platform, mainly cross-posting from X and other places
Kicking things off by promoting (to my nonexistent audience 😂) CVQA at NeurIPS!
Oral:
📍 East Meeting Room 1-3
🗓️ Thu, 12 Dec 3:30 pm PST
Poster:
📍 West Ballroom A-D #5110
🗓️ Thu, 12 Dec 4:30 pm PST