Lightnews — Scholar-powered news

Sung Kim

@sungkim.bsky.social

FlashMoBA

FlashMoBA is a memory-efficient sparse attention mechanism designed to accelerate the training and inference of long-sequence models where it achieves up to 14.7x speedup over FlashAttention-2 for small blocks.

November 17, 2025 at 6:55 AM

Sung Kim

@sungkim.bsky.social

ByteDance Seed's Virtual Width Networks (VWN)

(Comment — based on the number of authors on this paper, it must be a big deal.)

All the benefits of wider representations without incurring the quadratic cost of increasing the hidden size.

November 17, 2025 at 6:49 AM

Sung Kim

@sungkim.bsky.social

gtr - Git Worktree Runner

"Normally, you can only work on one git branch at a time in a folder. Want to fix a bug while working on a feature? You have to stash changes, switch branches, then switch back. Git worktrees let you have multiple branches checked out at once in different folders

November 16, 2025 at 11:24 PM

Sung Kim

@sungkim.bsky.social

Hooray for an introvert!

November 16, 2025 at 10:42 PM

Sung Kim

@sungkim.bsky.social

Paper: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence ( arxiv.org/abs/2509.03505 )
Model: huggingface.co/stable-ai/Li...

November 15, 2025 at 12:02 AM

Sung Kim

@sungkim.bsky.social

Stable AI released LimiX

Their attempt to develop a model to outperform gradient-boosting trees on tabular data.

November 15, 2025 at 12:02 AM

Sung Kim

@sungkim.bsky.social

Example: quickarxiv.org/abs/2510.26692

November 14, 2025 at 11:52 PM

Sung Kim

@sungkim.bsky.social

@alphaxiv.org released quickarXiv

Swap arxiv → quickarxiv on any paper URL to get an instant blog with figures, insights, and explanations.

November 14, 2025 at 11:52 PM

Sung Kim

@sungkim.bsky.social

Blog: hcompany.ai/blog/holo2
Model Cookbook: github.com/hcompai/hai-...
Model: huggingface.co/collections/...

November 14, 2025 at 11:48 PM

Sung Kim

@sungkim.bsky.social

hcompany released Holo2 (open-weight): their next-generation multimodal model family built for grounding, navigation, and reasoning across Web, Desktop, and Mobile.

Built on Qwen3-VL, it provides SOTA performance: 66.1% (+3%) on ScreenSpot-Pro and 76.1% (+5%) on OSWorld-G.

November 14, 2025 at 11:48 PM

Sung Kim

@sungkim.bsky.social

Google's Code Wiki

A platform that maintains a continuously updated, structured wiki for code repositories. I'm not sure how this differs from Cognition AI's DeepWiki ( deepwiki.org ), but we'll find out soon enough.

developers.googleblog.com/en/introduci...

November 14, 2025 at 10:33 PM

Sung Kim

@sungkim.bsky.social

- A single depth-ray representation is enough. No complex 3D tasks.

Project: depth-anything-3.github.io
Paper: arxiv.org/abs/2511.10647
Code: github.com/ByteDance-Se...
Hugging face demo: huggingface.co/spaces/depth...

November 14, 2025 at 10:27 PM

Sung Kim

@sungkim.bsky.social

ByteDance Seed's Depth Anything 3

DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. They find that:

- A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture.

November 14, 2025 at 10:27 PM

Sung Kim

@sungkim.bsky.social

There seem to be new posts about a format called TOON (Token-Oriented Object Notation), which aims to make communication with LLMs more accurate and token-efficient. Most of them compare JSON to TOON.

Is it just me, or does it basically look like CSV?

November 14, 2025 at 10:22 PM

Sung Kim

@sungkim.bsky.social

W&B LEET

A full Terminal UI (TUI) for live, interactive W&B monitoring right in your terminal.

wandb.ai/wandb_fc/pro...

November 14, 2025 at 5:23 AM

Sung Kim

@sungkim.bsky.social

Connect VS Code notebooks directly to Google Colab runtimes.

developers.googleblog.com/en/google-co...

November 14, 2025 at 4:15 AM

Sung Kim

@sungkim.bsky.social

Weibo, China's Twitter, released VibeThinker-1.5B — SOTA reasoning in a tiny model.

🚀 Performance: Highly competitive on AIME24/25 & HMMT25 — surpasses DeepSeek R1-0120 on math, and outperforms same-size models in competitive coding.

November 13, 2025 at 6:23 AM

Sung Kim

@sungkim.bsky.social

The core contribution lies in finally being able to answer a few fundamental questions theoretically:
- what distribution to impose on your embeddings
- how to do distribution matching in high-dim

Paper: arxiv.org/abs/2511.08544
Code: github.com/rbalestr-lab...

November 13, 2025 at 6:19 AM

Sung Kim

@sungkim.bsky.social

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...)

- 60+ arch., up to 2B params
- 10+ datasets
- in-domain training (>DINOv3)
- corr(train loss, test perf)=95%

November 13, 2025 at 6:19 AM

Sung Kim

@sungkim.bsky.social

Baidu, China's Google, releases ERNIE 5.0 — their latest natively omni-modal foundational model.

ernie.baidu.com

November 13, 2025 at 6:16 AM

Sung Kim

@sungkim.bsky.social

Who is powering meme stocks? Koreans!

www.ft.com/content/833b...

November 13, 2025 at 6:15 AM

Sung Kim

@sungkim.bsky.social

I like this narrative by Baidu founder Robin Li

Currently, we have an unhealthy ‘upright pyramid’ AI industry structure
- Application Layer
- Model Layer
- Chip Layer

They are shifting to a healthy AI industry structure, which is an ‘inverted pyramid'
- Application Layer
- Model Layer
- Chip Layer

November 13, 2025 at 3:09 AM

Sung Kim

@sungkim.bsky.social

If Microsoft has access to all of OpenAI’s IP, why are their AI models so lackluster?

November 13, 2025 at 2:28 AM

Sung Kim

@sungkim.bsky.social

Are prompting and activation steering just two sides of the same coin?

The paper formalizes a Bayesian framework for model control: altering a model's "beliefs" over which persona or data source it's emulating. Context (prompting) and internal representations (steering)

November 12, 2025 at 5:42 AM

Sung Kim

@sungkim.bsky.social

How character.ai trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model.

Here are a few optimizations that they did
- MuP-like scaling
- MQA + SWA
- Clamping everywhere to control activation
- KV Cache sharing

November 12, 2025 at 5:39 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news