zed
banner
zmkzmkz.bsky.social
zed
@zmkzmkz.bsky.social
yea I have the black and white himmel pfp on X. maybe I like this one better. but yeah I like watching loss graphs and stuff
lol yea nice catch. now that I think of it I should've described the conversion better in the paper, but tbh most of what is written in the code was guesswork since I didn't find any official GQA conversion scripts. also that specific part wasn't used bc turning linear QKV into MLPs did nothing lol
December 2, 2024 at 10:13 PM