Sung Kim
sungkim.bsky.social
Sung Kim
@sungkim.bsky.social
A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own.

You can also find me at threads: @sung.kim.mw
FlashMoBA

FlashMoBA is a memory-efficient sparse attention mechanism designed to accelerate the training and inference of long-sequence models where it achieves up to 14.7x speedup over FlashAttention-2 for small blocks.
November 17, 2025 at 6:55 AM
ByteDance Seed's Virtual Width Networks (VWN)

(Comment — based on the number of authors on this paper, it must be a big deal.)

All the benefits of wider representations without incurring the quadratic cost of increasing the hidden size.
November 17, 2025 at 6:49 AM
gtr - Git Worktree Runner

"Normally, you can only work on one git branch at a time in a folder. Want to fix a bug while working on a feature? You have to stash changes, switch branches, then switch back. Git worktrees let you have multiple branches checked out at once in different folders
November 16, 2025 at 11:24 PM
Hooray for an introvert!
November 16, 2025 at 10:42 PM
Paper: LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence ( arxiv.org/abs/2509.03505 )
Model: huggingface.co/stable-ai/Li...
November 15, 2025 at 12:02 AM
Stable AI released LimiX

Their attempt to develop a model to outperform gradient-boosting trees on tabular data.
November 15, 2025 at 12:02 AM
November 14, 2025 at 11:52 PM
@alphaxiv.org released quickarXiv

Swap arxiv → quickarxiv on any paper URL to get an instant blog with figures, insights, and explanations.
November 14, 2025 at 11:52 PM
November 14, 2025 at 11:48 PM
hcompany released Holo2 (open-weight): their next-generation multimodal model family built for grounding, navigation, and reasoning across Web, Desktop, and Mobile.

Built on Qwen3-VL, it provides SOTA performance: 66.1% (+3%) on ScreenSpot-Pro and 76.1% (+5%) on OSWorld-G.
November 14, 2025 at 11:48 PM
Google's Code Wiki

A platform that maintains a continuously updated, structured wiki for code repositories. I'm not sure how this differs from Cognition AI's DeepWiki ( deepwiki.org ), but we'll find out soon enough.

developers.googleblog.com/en/introduci...
November 14, 2025 at 10:33 PM
- A single depth-ray representation is enough. No complex 3D tasks.

Project: depth-anything-3.github.io
Paper: arxiv.org/abs/2511.10647
Code: github.com/ByteDance-Se...
Hugging face demo: huggingface.co/spaces/depth...
November 14, 2025 at 10:27 PM
ByteDance Seed's Depth Anything 3

DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. They find that:

- A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture.
November 14, 2025 at 10:27 PM
There seem to be new posts about a format called TOON (Token-Oriented Object Notation), which aims to make communication with LLMs more accurate and token-efficient. Most of them compare JSON to TOON.

Is it just me, or does it basically look like CSV?
November 14, 2025 at 10:22 PM
W&B LEET

A full Terminal UI (TUI) for live, interactive W&B monitoring right in your terminal.

wandb.ai/wandb_fc/pro...
November 14, 2025 at 5:23 AM
Connect VS Code notebooks directly to Google Colab runtimes.

developers.googleblog.com/en/google-co...
November 14, 2025 at 4:15 AM
Weibo, China's Twitter, released VibeThinker-1.5B — SOTA reasoning in a tiny model.

🚀 Performance: Highly competitive on AIME24/25 & HMMT25 — surpasses DeepSeek R1-0120 on math, and outperforms same-size models in competitive coding.
November 13, 2025 at 6:23 AM
The core contribution lies in finally being able to answer a few fundamental questions theoretically:
- what distribution to impose on your embeddings
- how to do distribution matching in high-dim

Paper: arxiv.org/abs/2511.08544
Code: github.com/rbalestr-lab...
November 13, 2025 at 6:19 AM
LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...)

- 60+ arch., up to 2B params
- 10+ datasets
- in-domain training (>DINOv3)
- corr(train loss, test perf)=95%
November 13, 2025 at 6:19 AM
Baidu, China's Google, releases ERNIE 5.0 — their latest natively omni-modal foundational model.

ernie.baidu.com
November 13, 2025 at 6:16 AM
Who is powering meme stocks? Koreans!

www.ft.com/content/833b...
November 13, 2025 at 6:15 AM
I like this narrative by Baidu founder Robin Li

Currently, we have an unhealthy ‘upright pyramid’ AI industry structure
- Application Layer
- Model Layer
- Chip Layer

They are shifting to a healthy AI industry structure, which is an ‘inverted pyramid'
- Application Layer
- Model Layer
- Chip Layer
November 13, 2025 at 3:09 AM
If Microsoft has access to all of OpenAI’s IP, why are their AI models so lackluster?
November 13, 2025 at 2:28 AM
Are prompting and activation steering just two sides of the same coin?

The paper formalizes a Bayesian framework for model control: altering a model's "beliefs" over which persona or data source it's emulating. Context (prompting) and internal representations (steering)
November 12, 2025 at 5:42 AM
How character.ai trained their proprietary model Kaiju (13B, 34B, 110B), before switching to OSS model.

Here are a few optimizations that they did
- MuP-like scaling
- MQA + SWA
- Clamping everywhere to control activation
- KV Cache sharing
November 12, 2025 at 5:39 AM