ColBERT vectors are often 10 bytes each. Ten bytes. That’s like 4 numbers.
It’s not “many vectors work better than one vector”. It’s “set similarity works better than dot product”.
Even with the same storage cost.
ColBERT vectors are often 10 bytes each. Ten bytes. That’s like 4 numbers.
It’s not “many vectors work better than one vector”. It’s “set similarity works better than dot product”.
Even with the same storage cost.
But it's prematurely abstracting that leads to the bitterness of wasted effort, and not "modularity doesn't work for AI". 2/2
But it's prematurely abstracting that leads to the bitterness of wasted effort, and not "modularity doesn't work for AI". 2/2
Hope this was useful!
Hope this was useful!
But that's not true: if you look at how aggressive ColBERTv2 representations are compressed, it's often ~20 bytes per vector (like 5 floats), which can be smaller than popular uncompressed single vectors!
But that's not true: if you look at how aggressive ColBERTv2 representations are compressed, it's often ~20 bytes per vector (like 5 floats), which can be smaller than popular uncompressed single vectors!
For ColBERT, you typically *fix* more than you break because you're moving *tokens* in a much smaller (and far more composable!) space.
For ColBERT, you typically *fix* more than you break because you're moving *tokens* in a much smaller (and far more composable!) space.
A dot product is just very hard to learn. An intuition I learned from Menon et al (2021) is that:
A dot product is just very hard to learn. An intuition I learned from Menon et al (2021) is that: