derekchen14.bsky.social
@derekchen14.bsky.social
Research scientist in conversational AI. Building
@soleda_ai through scalable data generation. Prev:
@ColumbiaNLP, @asapp, @UW, @stanfordnlp, @UCBerkeley
This is only tangentially related to pretraining commons, but I would love to hear your thoughts on: www.datologyai.com/post/technic...
Technical Deep-Dive: Curating Our Way to a State-of-the-Art Text Dataset
Our data curation pipeline to obtain substantial improvements in LLM quality, training speed, and inference efficiency.
www.datologyai.com
November 27, 2024 at 5:45 PM