Lightnews — Scholar-powered news

Artem Moskalev

@artemmoskalev.bsky.social

1.2K followers 520 following 40 posts

Re-imagining drug discovery with AI 🧬. Deep Learning ⚭ Geometry. Previously PhD at the University of Amsterdam. https://amoskalev.github.io/

Posts Replies Media Videos

Artem Moskalev

@artemmoskalev.bsky.social

Joint work with brilliant Mangal Prakash, Junjie Xu, Tianyu Cui, Rui Liao, and Tommaso Mansi.

8/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

Notably, when the equivariant transformer runs out of memory on sequences over 37k tokens, our model can handle up to 2.7M million tokens on a single A10G GPU, providing up to 72× longer context length within the same computational budget.

7/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

We test the proposed geometric long convolution on multiple large-molecule property and dynamics prediction tasks for RNA and protein biomolecules. Geometric Hyena is on par or better than equivariant self-attention at a fraction of its computational cost.

6/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

To evaluate long geometric context capabilities of our models, we introduce the geometric extension of the mechanistic interpretability suite. Specifically, we evaluate equivariant models over the increasing degree of complexity of equivariant associative recall.

5/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

Inspired by the recent success of long-context models, long-convolutions, and Hyena hierarchy, we propose the geometric counterpart. We rely on the power of FFT to push the computational complexity to NlogN, adapting it for vectors. The implementation is simple – just 50 lines of code!

4/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

In many biological or physical systems, we need equivariance + global context. Leading to quadratic complexity with respect to system size multiplied by the cost of equivariance. Existing equivariant models are not equipped to work on that scale 🫠.

3/8

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

Our solution is data-controlled geometric long convolution. It provides a global (all-to-all) context akin to self-attention but comes at O(NlogN) cost! No low-rank approximations or coarsening.

Paper: arxiv.org/abs/2505.22560
Code: is coming soon, stay tuned!

2/8

Geometric Hyena Networks for Large-scale Equivariant Learning

Processing global geometric context while preserving equivariance is crucial when modeling biological, chemical, and physical systems. Yet, this is challenging due to the computational demands of equi...

arxiv.org

June 4, 2025 at 8:03 AM

Artem Moskalev

@artemmoskalev.bsky.social

- HARMONY: A Multi-Representation Framework for RNA Property Prediction. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=TvBuXU1J2K

April 21, 2025 at 5:38 AM

Artem Moskalev

@artemmoskalev.bsky.social

- InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference. ORAL at AI4NA workshop. Monday. openreview.net/forum?id=nzUsRhtnBa

April 21, 2025 at 5:38 AM

Artem Moskalev

@artemmoskalev.bsky.social

- Beyond Sequence: Impact of Geometric Context for RNA Property Prediction. Saturday 10-12:30. Hall 3 + Hall 2B #5. openreview.net/forum?id=9htTvHkUhh

April 21, 2025 at 5:38 AM

Artem Moskalev

@artemmoskalev.bsky.social

1. Hyperbolic NNs for sequence modeling: jobs.jnj.com/en/jobs/2506234550w

2. Causal Inference and Bayesian Optimization: jobs.jnj.com/en/jobs/2506234553w

3. Multi-modal Sequence, Structure & Interaction modeling: jobs.jnj.com/en/jobs/2506234539w

Apply and reach out to me if interested! 😁

February 7, 2025 at 9:51 AM

Artem Moskalev

@artemmoskalev.bsky.social

If you need to pick a neural network and train it on RNA data, our work provides guidelines on which method works best in which conditions.

February 3, 2025 at 8:57 AM

Artem Moskalev

@artemmoskalev.bsky.social

What did we learn? In the presence of severe noise simple sequence transformer without any geometry works the best but it requires much more data to converge! At the same time, 3D Geometric GNNs are the most vulnerable to geometric noise.

February 3, 2025 at 8:57 AM

Artem Moskalev

@artemmoskalev.bsky.social

We study different types of neural networks on various types of RNA representations: 1D vs 2D vs 3D. We evaluate property prediction performance, noise robustness, data efficiency and OOD noise generalization.

February 3, 2025 at 8:56 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news