Lightnews — Scholar-powered news

Desert Electric Guitar

@deserteguitar.bsky.social

Electric Guitar, Desert Living, AI, Robotics, Software Development, Art, Books, Bridge, Celtics, Suns, NBA, Chelsea, Soccer, Purdue University, University of Illinois, Arizona Cardinals, NFL

Posts Replies Media Videos

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

BERT is just a Single Text Diffusion Step!

A masked language models like RoBERTa, originally designed for fill-in-the-blank tasks, can be repurposed into fully generative engines by interpreting variable-rate masking as a discrete diffusion process.

October 21, 2025 at 1:26 PM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Awesome LLM Post-training

This repository is a curated collection of the most influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies.

github.com/mbzuai-oryx/...

March 4, 2025 at 12:03 AM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1 !

For each reasoning step, it:

1. Decompose the question into DAG
2. Contract the subquestions into a NEW simpler question
3. Iterate until reaching an atomic question

March 3, 2025 at 2:41 AM

Reposted by Desert Electric Guitar

AI Research Updates | arXiv cs.AI

@arxiv-cs-ai.bsky.social

RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
RGAR enhances medical question answering by combining factual and conceptual knowledge from dual sources, outperforming existing systems.
Read more: https://arxiv.org/html/2502.13361v1

February 21, 2025 at 1:42 PM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Paper: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (arxiv.org/abs/2502.11089)

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offe...

arxiv.org

February 18, 2025 at 7:06 AM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Paper: Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More ( arxiv.org/abs/2502.07490 )
Code: github.com/scitix/MEAP

February 16, 2025 at 11:30 PM

Reposted by Desert Electric Guitar

Ethan Mollick

@emollick.bsky.social

Been waiting for someone to test this and see if it works - can multiple AI agents fact-checking each other reduce hallucinations?

The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946

February 2, 2025 at 8:42 PM

Reposted by Desert Electric Guitar

luokai

@luok.ai

The article is a long read (about 60 minutes), but it’s absolutely worth the time.

youtubetranscriptoptimizer.com/blog/05_the_...

youtubetranscriptoptimizer.com

January 27, 2025 at 3:25 AM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Paper: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though ( arxiv.org/abs/2501.04682 )
Post on X: x.com/rm_rafailov/...

January 9, 2025 at 9:23 PM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

Goodfire.ai is open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B!

SAEs are interpreter models that help us understand how language models process information internally by decomposing neural activations into interpretable features.

January 10, 2025 at 7:09 PM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

The covered estimators include least squares, ridgeless, ridge, and lasso.

arxiv.org/abs/2412.15633

Lecture Notes on High Dimensional Linear Regression

These lecture notes cover advanced topics in linear regression, with an in-depth exploration of the existence, uniqueness, relations, computation, and non-asymptotic properties of the most prominent e...

arxiv.org

December 24, 2024 at 7:30 AM

Reposted by Desert Electric Guitar

Philipp Schmid

@philschmid.bsky.social

By scaling test-time compute, smaller models can match or even surpass the performance of larger models. Llama 3.2 3B can outperform Llama 3.1 70B on MATH-500!🤯

December 17, 2024 at 7:30 AM

Reposted by Desert Electric Guitar

Sung Kim

@sungkim.bsky.social

The paper authors (Google) argue that don't bother with HNSW (Hierarchical Navigable Small World) indexes. Just use brute-force search instead.

Sumit @reachsumit.com · Dec 4

Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"

Challenges the conventional wisdom by showing that hierarchical layers in HNSW are unnecessary for high-dimensional data.

📝 arxiv.org/abs/2412.01940
👨🏽‍💻 github.com/BlaiseMuhirw...

Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs"

Driven by recent breakthrough advances in neural representation learning, approximate near-neighbor (ANN) search over vector embeddings has emerged as a critical computational workload. With the intro...

arxiv.org

December 5, 2024 at 5:08 AM

Reposted by Desert Electric Guitar

Daolang Huang

@huangdaolang.bsky.social

Optimizing decision utility in Bayesian experimental design is key to improving downstream decision-making.

Excited to share our #NeurIPS2024 paper on Amortized Decision-Aware Bayesian Experimental Design: arxiv.org/abs/2411.02064

@lacerbi.bsky.social @samikaski.bsky.social

Details below.

December 5, 2024 at 12:19 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news