Mixlayer demos on Meta's Llama 3.1 8b in this blog post. www.mixlayer.com/blog/2024-12...
Mixlayer demos on Meta's Llama 3.1 8b in this blog post. www.mixlayer.com/blog/2024-12...
Doesn't look like this model has made it onto HF yet (just a space, no weights), curious to learn more about the sparse attention mechanism.
qwenlm.github.io/blog/qwen2.5...
Doesn't look like this model has made it onto HF yet (just a space, no weights), curious to learn more about the sparse attention mechanism.
qwenlm.github.io/blog/qwen2.5...