Agreed that it's important to consider depth in addition to work. But for work to be negligible, the hardware would need to be infinitely scalable with input size because the asymptotic runtime is O(work/workers + depth). I doubt that this will change for future hardware due to physical limitations.
March 22, 2025 at 6:08 PM
Agreed that it's important to consider depth in addition to work. But for work to be negligible, the hardware would need to be infinitely scalable with input size because the asymptotic runtime is O(work/workers + depth). I doubt that this will change for future hardware due to physical limitations.
You could have a look at litmaps. They allow to explore new papers from a set of starting nodes. Also nice ways of sorting in time, relevancy, clustering etc.
February 28, 2025 at 11:06 AM
You could have a look at litmaps. They allow to explore new papers from a set of starting nodes. Also nice ways of sorting in time, relevancy, clustering etc.
Well, I'm here under @yann-lecun. But there is another account @ylecun registered under the same email address. I can't seem to log into @ylecun. If I try and reset the password, it resets it for @yann-lecun. I'm not sure what to do. I suspect I need to just delete the @yann-lecun account....
I also wonder how one would go about normalizing the sum in a variable sequence length mode.. How are you handling that at the moment for video? One potential solution could be a geometric decay on top of the mask, but that would lead to somewhat of an SSMs formulation? Interesting food for thought!
November 20, 2024 at 2:22 PM
I also wonder how one would go about normalizing the sum in a variable sequence length mode.. How are you handling that at the moment for video? One potential solution could be a geometric decay on top of the mask, but that would lead to somewhat of an SSMs formulation? Interesting food for thought!
Interesting idea! This reminds me of the Symmetric Power Transformer (manifestai.com/articles/symmetric-power-transformers), but in your case the values are subject to the polynomial kernel instead of queries/keys while the selection resembles a Gated Linear Unit which is used in many backbones:)
November 20, 2024 at 2:20 PM
Interesting idea! This reminds me of the Symmetric Power Transformer (manifestai.com/articles/symmetric-power-transformers), but in your case the values are subject to the polynomial kernel instead of queries/keys while the selection resembles a Gated Linear Unit which is used in many backbones:)