n1o_c0rTx
banner
n1o-cortx.bsky.social
n1o_c0rTx
@n1o-cortx.bsky.social
From machine learning to vunerability research.
For more general ML stuff: https://n1o.github.io/
For more ML focus on Vunerability Research: https://codebreakers.re/
Github: https://github.com/n1o
Of course we can and I compiled an introductory text on various techniques and research papers doing exactly that.
December 17, 2024 at 10:22 AM
There is a lot of research on how to combine Graph Neural Networks and Large Language Models. GALLa is a very interesting research paper that uses a well-known Adapter pattern (mostly used by Vision models) to embed a Graph into the embedding space of an LLM
December 10, 2024 at 8:44 AM
To take it to the next level, the authors develop a special Macro Inference score, which measure the contribution of individual transfomer blocks towards the models soft labels (token distribution) and choose the block that contributes the least!
December 3, 2024 at 1:21 PM
FuseGPT introdued here: arxiv.org/abs/2411.14507
takes the knowledge from the linear layers of the droped transfomer block. This is done by fusing (adding) the removed linear weights to the linear weight in its neighbourhood trough an low rank projection matrix.
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Generative Pre-trained Transformers (GPTs) have demonstrated remarkable performance across diverse domains through the extensive scaling of model parameters. Recent works observe the redundancy across...
arxiv.org
December 3, 2024 at 1:21 PM
To make it even better, their 1.5B model trained only on 1.5T tokens still achieves State of the Art among 2B models, with nearly 3x higher throughput.
December 2, 2024 at 8:50 AM
To reduce memory requirements, they share the KV cache between two consecutive layers, bringing down the cache's memory requirement by 20x compared to a vanilla attention model.
December 2, 2024 at 8:50 AM
They also introduced Meta tokens which smooth out attention's softmax distribution to avoid attention sinks, and at the same time they bootstrap Mamba's internal state.
December 2, 2024 at 8:50 AM
Long story short: They run Attention and Mamba in parallel at each layer, where Mamba serves as long-term memory, and Attention (mostly Sliding Window except on the First, Middle and Last Layer) as short-term memory with perfect recall.
December 2, 2024 at 8:50 AM
Ehm Qwen obviously!
November 25, 2024 at 4:15 PM
By taking pretrained LLMs there are multiple techniques that allow us to transform them into word models or sequence models while investing only a couple of hours on a single GPU and with a bit of pruning we can make them lean and mean!
November 25, 2024 at 9:39 AM
Why embedding models? They are still the kings when it comes to understanding tasks, and pretraining them is expensive.
November 25, 2024 at 9:39 AM