Zack Angelo
zackangelo.bsky.social
Zack Angelo
@zackangelo.bsky.social
building ai inference @ mixlayer
just realized bsky doesn't support gifs lol
December 15, 2024 at 2:40 PM
functions can even compose, here's the model using the output of one as the input into another
December 13, 2024 at 8:24 PM
weird that the instruction tuned Llama3 8b models are downloaded less than the original?
December 4, 2024 at 3:53 PM
I doubt they switch to a lower precision model, but would not be surprised if they start using a quantized or fp8 KV cache. Much easier to switch out dynamically in response to load vs the model weights.
November 23, 2024 at 5:43 PM