Lightnews — Scholar-powered news

Sebastian Loeschcke

@sloeschcke.bsky.social

340 followers 200 following 25 posts

Working on Efficient Training, Low-Rank Methods, and Quantization.
PhD at the University of Copenhagen 🇩🇰

Member of @belongielab.org, Danish Data Science Academy, and Pioneer Centre for AI 🤖
🔗 sebulo.github.io/

Posts Replies Media Videos

Sebastian Loeschcke

@sloeschcke.bsky.social

Thanks to my co-authors David Pitt, Robert Joseph George, Jiawwei Zhao, Cheng Luo, Yuandong Tian, Jean Kossaifi, @anima-anandkumar.bsky.social, and @caltech.edu for hosting me this spring!
Paper: arxiv.org/abs/2501.02379
Code: github.com/neuraloperat...

TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training

Scientific problems require resolving multi-scale phenomena across different resolutions and learning solution operators in infinite-dimensional function spaces. Neural operators provide a powerful fr...

arxiv.org

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

We also show strong results on other PDE benchmarks, including 𝐃𝐚𝐫𝐜𝐲 𝐟𝐥𝐨𝐰 and the 𝐁𝐮𝐫𝐠𝐞𝐫𝐬 equation, demonstrating TensorGRaD’s broad applicability across scientific domains.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

We test TensorGRaD on large-scale Navier–Stokes at 1024×1024 resolution with Reynolds number 10e5, a highly turbulent setting. With mixed-precision and 75% optimizer state reduction, it 𝐦𝐚𝐭𝐜𝐡𝐞𝐬 𝐟𝐮𝐥𝐥-𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐀𝐝𝐚𝐦 while cutting overall memory by up to 50%.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

We also propose a 𝐦𝐢𝐱𝐞𝐝-𝐩𝐫𝐞𝐜𝐢𝐬𝐢𝐨𝐧 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 strategy with weights, activations, and gradients in half precision and optimizer states in full precision, and empirically show that storing optimizer states in half precision hurts performance.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

We extend low-rank and sparse methods to tensors via a 𝐫𝐨𝐛𝐮𝐬𝐭 𝐭𝐞𝐧𝐬𝐨𝐫 𝐝𝐞𝐜𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧 that splits gradients into a low-rank Tucker part and an unstructured sparse tensor. Unlike matricized approaches, we prove our tensor-based method converges.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

Recent methods reduce optimizer memory for matrix weights. This includes Low-rank and sparse methods from LLMs that work on matrices. But to use them for Neural Operators, we’d need to flatten tensors, which destroys their spatial/temporal structure and hurts performance.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

These Neural Operators use tensor weights. However, optimizers like Adam store two full tensors per weight, making memory the bottleneck at scale.
TensorGRaD reduces this overhead by up to 75% (𝑑𝑎𝑟𝑘 𝑔𝑟𝑒𝑒𝑛 𝑏𝑎𝑟𝑠), without hurting accuracy.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

Scientific computing operates on multiscale, multidimensional (𝐭𝐞𝐧𝐬𝐨𝐫) 𝐝𝐚𝐭𝐚. In weather forecasting, for example, inputs span space, time, and variables. Neural operators can capture these multiscale phenomena by learning an operator that maps between function spaces.

June 3, 2025 at 3:17 AM

Sebastian Loeschcke

@sloeschcke.bsky.social

While Pasadena will be my home, I’ll also be making trips to Austin, the Bay Area, and San Diego. If you’re nearby and up for a chat, reach out—let’s meet up!

January 28, 2025 at 3:57 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news