Desert Electric Guitar
deserteguitar.bsky.social
Desert Electric Guitar
@deserteguitar.bsky.social
Electric Guitar, Desert Living, AI, Robotics, Software Development, Art, Books, Bridge, Celtics, Suns, NBA, Chelsea, Soccer, Purdue University, University of Illinois, Arizona Cardinals, NFL
Reposted by Desert Electric Guitar
BERT is just a Single Text Diffusion Step!

A masked language models like RoBERTa, originally designed for fill-in-the-blank tasks, can be repurposed into fully generative engines by interpreting variable-rate masking as a discrete diffusion process.
October 21, 2025 at 1:26 PM
Reposted by Desert Electric Guitar
Awesome LLM Post-training

This repository is a curated collection of the most influential papers, code implementations, benchmarks, and resources related to Large Language Models (LLMs) Post-Training Methodologies.

github.com/mbzuai-oryx/...
March 4, 2025 at 12:03 AM
Reposted by Desert Electric Guitar
Atom of Thoughts (AOT): lifts gpt-4o-mini to 80.6% F1 on HotpotQA, surpassing o3-mini and DeepSeek-R1 !

For each reasoning step, it:

1. Decompose the question into DAG
2. Contract the subquestions into a NEW simpler question
3. Iterate until reaching an atomic question
March 3, 2025 at 2:41 AM
Reposted by Desert Electric Guitar
RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering
RGAR enhances medical question answering by combining factual and conceptual knowledge from dual sources, outperforming existing systems.
Read more: https://arxiv.org/html/2502.13361v1
February 21, 2025 at 1:42 PM
Reposted by Desert Electric Guitar
Paper: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (arxiv.org/abs/2502.11089)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offe...
arxiv.org
February 18, 2025 at 7:06 AM
Reposted by Desert Electric Guitar
Paper: Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More ( arxiv.org/abs/2502.07490 )
Code: github.com/scitix/MEAP
February 16, 2025 at 11:30 PM
Reposted by Desert Electric Guitar
Been waiting for someone to test this and see if it works - can multiple AI agents fact-checking each other reduce hallucinations?

The answer appears to be yes - using 3 agents with a structured review process reduced hallucination scores by 96% across 310 test cases. arxiv.org/pdf/2501.13946
February 2, 2025 at 8:42 PM
Reposted by Desert Electric Guitar
The article is a long read (about 60 minutes), but it’s absolutely worth the time.

youtubetranscriptoptimizer.com/blog/05_the_...
youtubetranscriptoptimizer.com
January 27, 2025 at 3:25 AM
Reposted by Desert Electric Guitar

Paper: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though ( arxiv.org/abs/2501.04682 )
Post on X: x.com/rm_rafailov/...
January 9, 2025 at 9:23 PM
Reposted by Desert Electric Guitar
Goodfire.ai is open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B!

SAEs are interpreter models that help us understand how language models process information internally by decomposing neural activations into interpretable features.
January 10, 2025 at 7:09 PM
Reposted by Desert Electric Guitar
The covered estimators include least squares, ridgeless, ridge, and lasso.

arxiv.org/abs/2412.15633
Lecture Notes on High Dimensional Linear Regression
These lecture notes cover advanced topics in linear regression, with an in-depth exploration of the existence, uniqueness, relations, computation, and non-asymptotic properties of the most prominent e...
arxiv.org
December 24, 2024 at 7:30 AM
Reposted by Desert Electric Guitar
By scaling test-time compute, smaller models can match or even surpass the performance of larger models. Llama 3.2 3B can outperform Llama 3.1 70B on MATH-500!🤯
December 17, 2024 at 7:30 AM
Reposted by Desert Electric Guitar
The paper authors (Google) argue that don't bother with HNSW (Hierarchical Navigable Small World) indexes. Just use brute-force search instead.
December 5, 2024 at 5:08 AM
Reposted by Desert Electric Guitar
Optimizing decision utility in Bayesian experimental design is key to improving downstream decision-making.

Excited to share our #NeurIPS2024 paper on Amortized Decision-Aware Bayesian Experimental Design: arxiv.org/abs/2411.02064

@lacerbi.bsky.social @samikaski.bsky.social

Details below.
December 5, 2024 at 12:19 PM