Janu Verma
januverma.bsky.social
Janu Verma
@januverma.bsky.social
Principal Applied Scientist, Microsoft.
Interested in AI, RecSys, Maths.
Trains and fine-tunes models.
januverma.substack.com
When we try to fight Cold Shyness with willpower, we usually lose. The brain is too good at bargaining for comfort. The thing that fixes it is low-stakes repetition.
I wrote about why I stopped trying to "Win January" and started treating it as a training block for a February 1st "official" start.
January 5, 2026 at 6:14 PM
Whether through multi-task learning, auxiliary objectives, or simply smarter input design, giving models context unlocks generalization, robustness, and sometimes surprising insights.

It’s a good reminder: The best models don’t just predict, they understand
July 11, 2025 at 12:23 PM
Auxiliary Tasks: When training for sentiment analysis, add an auxiliary task like predicting part-of-speech tags. A better understanding of grammar leads to a better understanding of sentiment.
July 11, 2025 at 12:23 PM
Additional Contextual Data e.g. search queries in recommendations models: A user's search history is pure gold. A streaming service that sees you're searching for "Oscar-winning movies" can offer far more relevant suggestions than one relying on watch history alone.
July 11, 2025 at 12:23 PM
Multi-Objective Training: Don't just predict customer purchase; also predict the likelihood of a return and a positive review. This creates a more holistic and useful e-commerce model.
July 11, 2025 at 12:23 PM
Covers the significance of Anfinsen’s experiment, the role of the CASP competition, and why protein structure prediction was considered an AI-complete problem. This sets the stage for understanding how AlphaFold-2 achieved its breakthrough.
July 9, 2025 at 10:16 AM
Part III involves using frontier models to generate (synthetic) ‘reasoning’ for user engagement based on past interactions and then use the reasoning-augmented data to SFT Qwen 1.5B model. Comparable or better results with just 10% of the interaction open.substack.com/pub/januverma/…
February 12, 2025 at 9:58 PM
Part II of my explorations with LLMs for Recommendation tasks involves experimenting with base models of varying sizes from 0.5B to 14B params(Qwen 2.5 Series) and incorporating user attributes.
januverma.substack.com/p/large-language-models-for-recommender-35c
Large Language Models for Recommender Systems II - Scaling
Do scaling laws extend to recommendation?
januverma.substack.com
February 4, 2025 at 2:31 PM
First experiment is on building a proof of concept for LLM recommender by supervised fine-tuning (SFT) a small-scale LLM (Llama 1B). januverma.substack.com/p/large-lang...
Large Language Models for Recommender Systems
Can LLMs reason over user behaviour data to decipher preferences?
januverma.substack.com
February 4, 2025 at 2:27 PM
Or they are too narcissistic to even notice the work/life of others. I feel there could be a coping mechanism to make their view quite myopic - ignorance is a bliss, I guess.
December 3, 2024 at 7:18 AM