brendan chambers
societyoftrees.bsky.social
brendan chambers
@societyoftrees.bsky.social
Ithaca | prev Chicago | interested in interconnected systems and humans+computers | past and future: academic and industry research | currently: gardening
On the backward pass, significant additional latency savings come from skipping entries that quantize to zero. Table A1 shows how helpful this pruning is.
November 22, 2024 at 7:56 PM
I know there are a lot of dataheads on bluesky right now: here is my first draft of an answer to “What is the typical approach for storing large text datasets in the context of LM pretraining (and shouldn’t you use a relational db)”. If anyone has any deep thoughts or hot takes definitely come at me
October 29, 2024 at 8:16 PM
I agree and this often feels really confusing and wrong to data engineers. Why? Too long for a reply so it’s an image. Thank you for your work
October 29, 2024 at 7:57 PM