Mainly interested in Language Model Interpretability and Model Diffing.
MATS 7.0 Winter 2025 Scholar w/ Neel Nanda
jkminder.ch
Trained on 15T tokens in 1,000+ languages, it’s built for transparency, responsibility & the public good.
Read more: actu.epfl.ch/news/apertus...
Paper: arxiv.org/abs/2507.08802
Paper: arxiv.org/abs/2507.08802
Rather than trying to reverse-engineer the full fine-tuned model, model diffing focuses on understanding what makes it different from its base model internally.
Rather than trying to reverse-engineer the full fine-tuned model, model diffing focuses on understanding what makes it different from its base model internally.
Post Guidance lets moderators prevent rule-breaking by triggering interventions as users write posts!
We implemented PG on Reddit and tested it in a massive field experiment (n=97k). It became a feature!
arxiv.org/abs/2411.16814
Post Guidance lets moderators prevent rule-breaking by triggering interventions as users write posts!
We implemented PG on Reddit and tested it in a massive field experiment (n=97k). It became a feature!
arxiv.org/abs/2411.16814
arxiv.org/abs/2411.07404
Co-led with
@kevdududu.bsky.social - @niklasstoehr.bsky.social , Giovanni Monea, @wendlerc.bsky.social, Robert West & Ryan Cotterell.
arxiv.org/abs/2411.07404
Co-led with
@kevdududu.bsky.social - @niklasstoehr.bsky.social , Giovanni Monea, @wendlerc.bsky.social, Robert West & Ryan Cotterell.
go.bsky.app/LisK3CP
go.bsky.app/LisK3CP
bsky.app/starter-pack...
bsky.app/starter-pack...