Spandan Karma Mishra
spandyie.bsky.social
Spandan Karma Mishra
@spandyie.bsky.social
AI @ PANW , Statistical learning, Bayesian , Rock climber, history buff & Nepali
Reposted by Spandan Karma Mishra
Huh. Looks like Plato was right.

A new paper shows all language models converge on the same "universal geometry" of meaning. Researchers can translate between ANY model's embeddings without seeing the original text.

Implications for philosophy and vector databases alike. arxiv.org/pdf/2505.12540
May 23, 2025 at 2:44 AM
Reposted by Spandan Karma Mishra
AI Agents vs. Agentic #AI: A Conceptual Taxonomy, Applications and Challenges (preprint) arxiv.org/abs/2505.10468
May 17, 2025 at 1:55 PM
Reposted by Spandan Karma Mishra
BREAKING NEWS: The White House has begun the process of looking for a new secretary of defense, according to a U.S. official who was not authorized to speak publicly.
The White House has begun process of looking for new secretary of defense
The White House has begun the process of looking for a new secretary of defense, according to a U.S. official who was not authorized to speak publicly.
www.npr.org
April 21, 2025 at 5:25 PM
Reposted by Spandan Karma Mishra
They show LMs can synthesize their own thoughts for more data-efficient pretraining, bootstrapping their capabilities on limited, task-agnostic data. They call this new paradigm “reasoning to learn”.
March 27, 2025 at 3:55 AM
Reposted by Spandan Karma Mishra
PapersChat – Chat with Research Papers

PapersChat provides an agentic AI interface for querying papers, retrieving insights from ArXiv & PubMed, and structuring responses efficiently.

github.com/AstraBert/Pa...
March 10, 2025 at 4:47 AM
Reposted by Spandan Karma Mishra
French Senator Claude Malhuret:

"Washington has become Nero’s court, with an incendiary emperor, submissive courtiers and a jester high on ketamine... We were at war with a dictator, we are now at war with a dictator backed by a traitor."
March 5, 2025 at 3:47 PM
Reposted by Spandan Karma Mishra
A few words on DeepSeek new releases. Links are:
- github.com/deepseek-ai/...
- github.com/deepseek-ai/...
- github.com/deepseek-ai/...
and the Ultra-Scale Playbook at huggingface.co/spaces/nanot...
February 27, 2025 at 1:41 PM
Reposted by Spandan Karma Mishra
Just read the s1: Simple Test-Time Scaling paper. Super interesting approach to improving reasoning models!

TL;DR:
1. SFT on 1k curated examples w/ reasoning traces.
2. Control response length w/ budget forcing:
"Wait" tokens → longer reasoning/self-correction.
"Final Answer:" → enforce stopping.
February 7, 2025 at 2:00 PM
Reposted by Spandan Karma Mishra
Maybe a hot take, but what about the following advice to the next gen:
Don't get an AI degree; the curriculum will be outdated before you graduate. Instead, study math, stats, or physics as your foundation, and stay current with AI through code-focused books, blogs, and papers.
February 9, 2025 at 3:36 PM
Reposted by Spandan Karma Mishra
Bison should be allowed to roam free and cattle should be restricted to private land.
All abandoned barbed wire should be removed from public land.
The money today being wasted on public lands grazing should go into building wildlife overpasses and installing wildlife safe guide fencing.
February 7, 2025 at 4:46 PM
Reposted by Spandan Karma Mishra
Not one VC would ever fund a startup to do the kind of hardcore optimization work that DeepSeek did.

Every VC firm should be asking themselves why.
January 28, 2025 at 5:00 AM
Reposted by Spandan Karma Mishra
Finally finally finally some scaling curves for imitation learning in the large-scale-data regime: arxiv.org/abs/2411.04434
Scaling Laws for Pre-training Agents and World Models
The performance of embodied agents has been shown to improve by increasing model parameters, dataset size, and compute. This has been demonstrated in domains from robotics to video games, when generat...
arxiv.org
January 20, 2025 at 2:48 PM
Reposted by Spandan Karma Mishra
And here's a great reader project who trained a tokenizer from scratch on Nepali: github.com/rasbt/LLMs-f...
GPT2-Nepali (Pretrained from scratch) · rasbt LLMs-from-scratch · Discussion #485
Hi everyone! 👋 I’m excited to share my recent project: GPT2-Nepali, a GPT-2 model pretrained from scratch for the Nepali language. This project builds upon the GPT-2 model training code detailed in...
github.com
January 19, 2025 at 4:37 PM
Reposted by Spandan Karma Mishra
Nice and fresh content to understand how Large Language Models work: arxiv.org/abs/2501.09223 #LLM #NLP
Foundations of Large Language Models
This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is st...
arxiv.org
January 19, 2025 at 3:17 PM
Reposted by Spandan Karma Mishra
This is a wonderfully simple blog on how tensors flow through a transformer model.

Covering:
- Tokenize
- Embed
- Positional Encoding
- Decoder
- Multi-Head Attention
- Add and normalize
- Feed-Forward
- Model Head
- Cross-Attention

Blog:
Mastering Tensor Dimensions in Transformers
A Blog post by Hafedh Hichri on Hugging Face
buff.ly
January 14, 2025 at 1:00 PM
Reposted by Spandan Karma Mishra
Free Our Feeds! What is it! @freeourfeeds.com

F.O.F. is an independent group with the goal of running THIS👇 social network totally outside of Bluesky.

It's not us. It's a fully independent version of the network. All the same users and posts. Running cooperatively with us and others.
January 13, 2025 at 9:03 PM
Reposted by Spandan Karma Mishra
If you’re an AI startups, or interviewing w/ one ask:

What are you the best in the world at?

Do you offer a service, formula, or delivery method you invented?

Is there something you do that’s patentable or a unique user experience?

Have you identified and isolated a market segment?

If not, walk
January 5, 2025 at 10:33 PM
Happy new year 2025
January 1, 2025 at 6:53 PM
Reposted by Spandan Karma Mishra
Very interesting paper by Ananda Theertha Suresh et al.

For categorical/Gaussian distributions, they derive the rate at which a sample is forgotten to be 1/k after k rounds of recursive training (hence 𝐦𝐨𝐝𝐞𝐥 𝐜𝐨𝐥𝐥𝐚𝐩𝐬𝐞 happens more slowly than intuitively expected)
December 27, 2024 at 11:35 PM
Reposted by Spandan Karma Mishra
Releasing a dataset of 40 million Bluesky posts!

Collected using the Firehose API, I hope people do some cool ML with it.

Anonymized with a data removal mechanism and includes text, language predictions, and image data.

#ai #ml #NLP

huggingface.co/datasets/Ara...
Aranym/40-million-bluesky-posts · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
December 17, 2024 at 3:25 PM
Reposted by Spandan Karma Mishra
A short list of tips for keeping a clean, organized ML codebase for new researchers: eugenevinitsky.com/posts/quick-...
Eugene Vinitsky
eugenevinitsky.com
December 18, 2024 at 8:00 PM
Reposted by Spandan Karma Mishra
Hey all, I've been a bit quiet the last couple of weeks as I am recovering from an accident & injury.

Unfortunately, I couldn’t write my yearly AI research review this year, but here’s at least a list of bookmarked papers you might find useful: magazine.sebastianraschka.com/p/llm-resear...
LLM Research Papers: The 2024 List
A curated list of interesting LLM-related research papers from 2024, shared for those looking for something to read over the holidays.
magazine.sebastianraschka.com
December 22, 2024 at 2:02 PM
Reposted by Spandan Karma Mishra
New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:
December 18, 2024 at 5:47 PM
Reposted by Spandan Karma Mishra
LLMs might secretly be world models of the internet!

By treating LLMs as simulators that can predict "what would happen if I click this?" the authors built an AI that can navigate websites by imagining outcomes before taking action, performing 33% better than baseline. arxiv.org/pdf/2411.06559
December 3, 2024 at 2:00 AM