www: jemoka.com
ac: nlp.stanford.edu/~houjun/
✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens
No fancy loss, no chain of thought labels 🚀
✅ Learn yourself a reasoning model with normal pretraining
✅ Better perplexity compared to fixed thinking tokens
No fancy loss, no chain of thought labels 🚀
Dispersed representations built by dropout => less consistent representation of the world => worse models.
Dispersed representations built by dropout => less consistent representation of the world => worse models.
You should **drop dropout** when you are training your LMs AND MLMs!
You should **drop dropout** when you are training your LMs AND MLMs!