🧵
🧵
🧵
🧵
🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning.
🧠 Process reward trainer: Enables training of Process-supervised Reward Models (PRMs), which reward the quality of intermediate steps, promoting structured reasoning.
- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost
- 📉 Performance tradeoff:
- ⚡ Speed & efficiency: It's multiple times faster and uses significantly less memory than DeBERTav3. You can use larger batch sizes and enabling bf16 (instead of fp16) gave me a ~2x speed boost
- 📉 Performance tradeoff:
This will probably be the basis for many future SOTA encoders! I can finally stop using DeBERTav3 2021 :D
This will probably be the basis for many future SOTA encoders! I can finally stop using DeBERTav3 2021 :D