Lightnews — Scholar-powered news

Max Zhdanov

@maxxxzdn.bsky.social

thats a wrap!

you can find the code here: github.com/maxxxzdn/erwin

and here is a cute visualisation of Ball Tree building:

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

To scale Erwin further to gigantic industrial datasets, we combine Transolver and Erwin by applying ball tree attention over latent tokens.

The key insight is that by using Erwin, we can afford larger bottleneck sizes while maintaining efficiency.

Stay tuned for updates!

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

To improve the processing of long-range information while keeping the cost sub-quadratic, we combine the ball tree with Native Sparse Attention (NSA), which align naturally with each other.

Accepted to Long-Context Foundation Models at ICML 2025!

paper: arxiv.org/abs/2506.12541

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

There are many possible directions for building upon this work. I will highlight two on which I have been working together with my students.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

On top of the initial experiments in the first version of the paper, we include two more PDE-related benchmarks, where Erwin shows strong performance, achieving state-of-the-art on multiple tasks.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

Original experiments: cosmology, molecular dynamics and turbulent fluid dynamics (EAGLE).

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

Ball tree representation comes with a huge advantage of contiguous memory layout, making all operations extremely simple and efficient.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

Implemented naively, however, the model will not be able to exchange information between partitions - and thus unable to capture global interactions.

To overcome this, we adapt the idea from the Swin Transformer, but instead of shifting windows, we rotate trees.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

For that reason, we impose a regular structure onto irregular data via ball trees.

That allows us to restrict attention computation to local partitions, reducing the cost to linear.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

Erwin follows the second approach as we believe that working at full resolution and the original representation should be done for as long as possible - let the data guide the model to decide which information to use and which to discard.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

In the context of modeling large physical systems, this is critical, as the largest applications can include millions of points, which makes full attention unrealistic.

There are two ways: either change the data representation or the data structure.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

When it comes to irregular data, however, there is no natural ordering of points, hence standard sparse attention mechanisms break down as there is no guarantee that the locality they were based upon still exists.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

We start with sparse (sub-quadratic) attention coming in different flavors but all following the same idea - exploiting the regular structure of the data to model interactions between tokens.

June 21, 2025 at 4:23 PM

Max Zhdanov

@maxxxzdn.bsky.social

And that is a wrap! 🫔

We believe models like Erwin will enable the application of deep learning to physical tasks that handle large particle systems and where runtime was previously a bottleneck.

for details, see the preprint: arxiv.org/abs/2502.17019

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

Bonus: to minimize the computational overhead of ball tree construction, we develop a fast, parallelized implementation in C++.

15/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

On the large-scale EAGLE benchmark (1M meshes, each with ~3500 nodes), we achieve SOTA performance in simulating unsteady fluid dynamics.

14/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

At the molecular dynamics task, Erwin pushes the Pareto frontier in performance vs runtime, proving to be a viable alternative to message-passing based architectures.

13/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

The cosmological data exhibits long-range dependencies. As each point cloud is relatively large (5,000 points), this poses a challenge for message-passing models. Erwin, on the other hand, is able to capture those effects.

12/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

We validated Erwin's performance on a variety of large-scale tasks, including:
- cosmology (5k nodes per data point)
- molecular dynamics (~1k nodes per data point)
- turbulent fluid dynamics (~3.5k nodes per data point)

11/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

Due to the simplicity of implementation, Erwin is blazing fast. For a batch of 16 point clouds, 4096 points each, it only takes ~30 ms to compute the forward pass!

10/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

The ball tree is stored in memory contiguously - at each level of the tree, points in the same ball are stored next to each other.
This property is critical and allows us to implement the key operations described above simply via .view() or .mean()!

9/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

As Ball Tree Attention is local, we progressively coarsen and then refine the ball tree while keeping the ball size for attention fixed, following a U-Net-like architecture.
This allows us to learn multi-scale features and effectively increase the model's receptive field.

8/N

March 5, 2025 at 6:04 PM

Max Zhdanov

@maxxxzdn.bsky.social

The main idea of the paper is to compute attention within the ball tree partitions.
Once the tree is built, one can choose the level of the tree and compute attention (Ball Tree Attention, BTA) within the balls in parallel.

7/N

March 5, 2025 at 6:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news