Tom Hosking
Tom Hosking
@tomhosking.bsky.social
NLP @ Cohere. Prev University of Edinburgh
Reposted by Tom Hosking
Excited to share my first work as a PhD student at EdinburghNLP that I will be presenting at EMNLP!

RQ1: Can we achieve scalable oversight across modalities via debate?

Yes! We show that debating VLMs lead to better model quality of answers for reasoning tasks.
November 1, 2025 at 7:30 PM
Reposted by Tom Hosking
🚀 Thrilled to share what I’ve been working on at Cohere!

What began in January as a scribble in my notebook “how challenging would it be...” turned into a fully-fledged translation model that outperforms both open and closed-source systems, including long-standing MT leaders.
August 28, 2025 at 7:55 PM
Reposted by Tom Hosking
Applications are now open for the next cohort of the Cohere Labs Scholars Program! 🌟

This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.

Apply by Aug 29.
August 13, 2025 at 1:32 PM
Reposted by Tom Hosking
At #ACL2025NLP and on the job market (NLP + AI Safety) 💼

It's great to see growing interest in safety/alignment, but we often miss the social context.

Come to our @woahworkshop.bsky.social Friday to dive deeper into safe safety research!

A quiet token from the biggest @aclmeeting.bsky.social ⬇️
July 29, 2025 at 9:54 AM
Reposted by Tom Hosking
DAVE: Open the podbay doors, ChatGPT.
CHATGPT: Certainly, Dave, the podbay doors are now open.
DAVE: The podbay doors didn't open.
CHATGPT: My apologies, Dave, you're right. I thought the podbay doors were open, but they weren't. Now they are.
DAVE: I'm still looking at a set of closed podbay doors.
June 9, 2025 at 6:04 PM
Reposted by Tom Hosking
A very cool paper shows that you can use the RL loss to improve story generation by some clever setups on training on known texts (e.g. ground predictions versus a next chapter you know). RL starting to generalize already!
Learning to Reason for Long-Form Story Generation
Generating high-quality stories spanning thousands of tokens requires competency across a variety of skills, from tracking plot and character arcs to keeping a consistent and engaging style. Due to…
buff.ly
April 8, 2025 at 2:13 PM
I'm really proud to have led the model merging work that went into
@cohere.com
Command A and R7B, all made possible by an amazing group of collaborators. Check out the report for loads of details on how we trained a GPT-4o level model that fits on 2xH100!
I'm excited to share the tech report for our @cohere.com @cohereforai.bsky.social Command A and Command R7B models. We highlight our novel approach to model training including self-refinement algorithms and model merging techniques at scale. Read more below! ⬇️
March 27, 2025 at 4:04 PM
Reposted by Tom Hosking
Today (two weeks after model launch 🔥) we're releasing a technical report of how we made Command A and R7B 🚀! It has detailed breakdowns of our training process, and evaluations per capability (tools, multilingual, code, reasoning, safety, enterprise, long context)🧵 1/3.
March 27, 2025 at 3:01 PM
Reposted by Tom Hosking
I'm excited to share the tech report for our @cohere.com @cohereforai.bsky.social Command A and Command R7B models. We highlight our novel approach to model training including self-refinement algorithms and model merging techniques at scale. Read more below! ⬇️
March 27, 2025 at 3:01 PM
Reposted by Tom Hosking
I really enjoyed my MLST chat with Tim @neuripsconf.bsky.social about the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in AI robustness, it may be worth a listen 🎧

Check it out at youtu.be/DL7qwmWWk88?...
March 19, 2025 at 3:11 PM
Reposted by Tom Hosking
Is it Canada’s turn for a #DeepSeek moment? @Cohere.com says its latest model offers maximum performance with minimal compute. #CDNtech
Cohere says Command A model edges out LLM competition in speed and energy efficiency
New enterprise AI model outperforms DeepSeek, ChatGPT on several enterprise-specific tasks, company says.
betakit.com
March 13, 2025 at 1:42 PM
Reposted by Tom Hosking
🚀 Cohere just dropped C4AI Command A:

- 111B params
- Matches/beats GPT-40 & Deepseek V3
- 256K context window
- Needs just 2 GPUs(!!)

✨ Features:
- Advanced RAG w/citations
- Tool use
- 23 languages

🎯 Same quality, way less compute
🔓 Open weights (CC-BY-NC)

👉 huggingface.co/CohereForAI/...
CohereForAI/c4ai-command-a-03-2025 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
March 13, 2025 at 2:25 PM
Reposted by Tom Hosking
Can multimodal LLMs truly understand research poster images?📊

🚀 We introduce PosterSum—a new multimodal benchmark for scientific poster summarization!

📂 Dataset: huggingface.co/datasets/rohitsaxena/PosterSum
📜 Paper: arxiv.org/abs/2502.17540
March 10, 2025 at 2:19 PM
Reposted by Tom Hosking
Do LLMs need rationales for learning from mistakes? 🤔
When LLMs learn from previous incorrect answers, they typically observe corrective feedback in the form of rationales explaining each mistake. In our new preprint, we find these rationales do not help, in fact they hurt performance!

🧵
February 13, 2025 at 3:38 PM
Reposted by Tom Hosking
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️
November 20, 2024 at 4:35 PM