#sparseattention
September 30, 2025 at 9:02 PM
FG‑Attn applies fine‑grained sparse attention, yielding an average 1.55× speedup on a single NVIDIA H100 GPU for five‑second 480p clips (up to 1.65×). Read more: https://getnews.me/fg-attn-brings-fine-grained-sparse-attention-to-speed-video-diffusion/ #fgattn #sparseattention
September 24, 2025 at 8:07 AM
DeepSeek released the V3.2‑exp model on Monday, using sparse attention to cut API costs for long‑context tasks by half. The model and weights are open‑source on Hugging Face. Read more: https://getnews.me/deepseek-unveils-sparse-attention-model-to-halve-api-costs/ #deepseek #sparseattention
October 2, 2025 at 6:23 PM
FlashInfer v0.2 by @yzh119.bsky.social

FlashInfer is a library and kernel generator for Large Language Models that provides high-performance implementation of LLM GPU kernels such as FlashAttention, SparseAttention, PageAttention, Sampling, and more.
December 19, 2024 at 9:54 PM
ProxyAttn, a training‑free method without additional training using representative heads, claims up to 10.3× faster raw attention and 2.4× speed‑up in LLM pre‑fill. Read more: https://getnews.me/proxyattn-introduces-guided-sparse-attention-using-representative-heads/ #proxyattn #sparseattention
September 30, 2025 at 11:38 PM
DeepSeek unveils V3.2-exp model with sparse attention, slashing AI inference costs by up to 50%. A game-changer for long-context operations! #AI #DeepSeek #SparseAttention #TechInnovation Link: thedailytechfeed.com/deepseeks-v3...
September 30, 2025 at 3:42 PM
A new input‑aware sparse attention method cuts compute by up to 40% and enables real‑time co‑speech video generation with better lip‑sync. Submitted 2 Oct 2025. https://getnews.me/input-aware-sparse-attention-enables-real-time-co-speech-video-generation/ #sparseattention #realtime
October 6, 2025 at 6:25 AM