✔️ Enabling more accurate retrieval for AI-generated responses
Kudos to @nohtow.bsky.social for this new SOTA achievement!
🔗 Read the full blog article: www.lighton.ai/lighton-blog...
✔️ Enabling more accurate retrieval for AI-generated responses
Kudos to @nohtow.bsky.social for this new SOTA achievement!
🔗 Read the full blog article: www.lighton.ai/lighton-blog...
Learn more about PyLate here: lightonai.github.io/pylate/
Learn more about PyLate here: lightonai.github.io/pylate/
You can reproduce this SOTA training with <80 LoC and 2 hours of training and it'll run NanoBEIR during training, report it to W&B and create an informative model card!
Link to the gist: gist.github.com/NohTow/3030f...
You can reproduce this SOTA training with <80 LoC and 2 hours of training and it'll run NanoBEIR during training, report it to W&B and create an informative model card!
Link to the gist: gist.github.com/NohTow/3030f...
It is thus very suited to handle your very long documents!
It is thus very suited to handle your very long documents!
While it is bigger, it is still a very lightweight model and benefits from the efficiency of ModernBERT!
Also, it has only been trained on MS MARCO (for late interaction) and should thus generalize pretty well!
While it is bigger, it is still a very lightweight model and benefits from the efficiency of ModernBERT!
Also, it has only been trained on MS MARCO (for late interaction) and should thus generalize pretty well!
GTE-ModernColBERT is trained on top of the GTE-ColBERT model using knowledge distillation on the MS MARCO dataset and is the first SOTA model trained using PyLate!
Get started with PyLate using the documentation:
lightonai.github.io/pylate/
GTE-ModernColBERT is trained on top of the GTE-ColBERT model using knowledge distillation on the MS MARCO dataset and is the first SOTA model trained using PyLate!
Get started with PyLate using the documentation:
lightonai.github.io/pylate/
huggingface.co/lightonai/mo...
Congrats to @nohtow.bsky.social for this great work!
huggingface.co/lightonai/mo...
Congrats to @nohtow.bsky.social for this great work!
Notably, the performance with dimension 256 is only slightly worse than the base version with full dimension 768
Notably, the performance with dimension 256 is only slightly worse than the base version with full dimension 768
ModernBERT-embed-large is trained using the same (two-stage training) recipe as its smaller sibling and expectedly increases the performance, reaching +1.22 in MTEB average
ModernBERT-embed-large is trained using the same (two-stage training) recipe as its smaller sibling and expectedly increases the performance, reaching +1.22 in MTEB average
www.linkedin.com/posts/fremyc...
In the mean time, you could have a shot with mGTE (using xformers) or recent language-specific iteration of BERT such as CamemBERTv2!
www.linkedin.com/posts/fremyc...
In the mean time, you could have a shot with mGTE (using xformers) or recent language-specific iteration of BERT such as CamemBERTv2!
Having also worked a lot on causal models, I never thought of this kind of modelling because I always opposed MLM to open ended generation
I guess with papers such as this one arxiv.org/pdf/2406.04823, I should more!
Very interesting perspective, thanks!
Having also worked a lot on causal models, I never thought of this kind of modelling because I always opposed MLM to open ended generation
I guess with papers such as this one arxiv.org/pdf/2406.04823, I should more!
Very interesting perspective, thanks!
Or give me pointers?
Is it because having a fixed value bias the learning w.r.t the way we will sample downstream? (Like not mask 30% of the target?)
Or give me pointers?
Is it because having a fixed value bias the learning w.r.t the way we will sample downstream? (Like not mask 30% of the target?)
To me the logic would be to ramp up to have a kick off signal and make it harder and harder but papers seems to say otherwise
Maybe random is the optimal solution!
To me the logic would be to ramp up to have a kick off signal and make it harder and harder but papers seems to say otherwise
Maybe random is the optimal solution!
We ended up not really digging much into this particular aspect, again because we had so much to explore
We ended up not really digging much into this particular aspect, again because we had so much to explore
Again, the original goal of the project (besides cool models) was to convince some researchers to spend a bit of their GPUs hours on encoders pre-training again!
Hopefully we nailed it and will have the answers to a lot of questions in the future!
Again, the original goal of the project (besides cool models) was to convince some researchers to spend a bit of their GPUs hours on encoders pre-training again!
Hopefully we nailed it and will have the answers to a lot of questions in the future!