Our ICML position paper's answer: simply train on a bunch of artificial data (noise) and only do inference on real-world data! 1/n
Our ICML position paper's answer: simply train on a bunch of artificial data (noise) and only do inference on real-world data! 1/n
icml-structured-fm-workshop.github.io
icml-structured-fm-workshop.github.io
- 7 of 8 layers are linear att
- implemented a flash-variant of linear attention + ring-att
- post-norm is back in large models! (using deepnorm)
- prob. wrong scaling laws, as lr schedule is not adapted (see Chinchilla)
Let's see how it fares in the arena!
- 7 of 8 layers are linear att
- implemented a flash-variant of linear attention + ring-att
- post-norm is back in large models! (using deepnorm)
- prob. wrong scaling laws, as lr schedule is not adapted (see Chinchilla)
Let's see how it fares in the arena!
Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉
This is excellent news for (small) tabular ML! Checkout our Nature article (nature.com/articles/s41...) and code (github.com/PriorLabs/Ta...)
This is excellent news for (small) tabular ML! Checkout our Nature article (nature.com/articles/s41...) and code (github.com/PriorLabs/Ta...)
Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉
Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉