Sadiq Jaffer
sadiq.toao.com
Sadiq Jaffer
@sadiq.toao.com
Researcher @ Cambridge CL, OCaml hacker, fmr CEO at Opsian
There's relatively little LLM training data for niche languages and this causes poorer coding agent performance. I think this is an existential threat for smaller language communities like OCaml.

My talk at the OCaml workshop gave some actionable steps to mitigate that: toao.com/blog/ai-exis...
Three Steps for OCaml to Crest the AI Humps - Sadiq Jaffer
toao.com
October 25, 2025 at 12:39 PM
Reposted by Sadiq Jaffer
Every OCaml talk needs a pun, and @sadiq.toao.com is no exception #icfpsplash25
October 17, 2025 at 8:32 AM
Reposted by Sadiq Jaffer
Our lightning talks session opens with @sadiq.toao.com demonstrating TESSERA, their new geospatial foundation model that is FAIR and global #icfpsplash25
October 13, 2025 at 8:20 AM
Reposted by Sadiq Jaffer
Not how I expected to make my @arstechnica.com debut but I'll take it arstechnica.com/ai/2025/09/c...
Can AI detect hedgehogs from space? Maybe if you find brambles first.
Cambridge researchers use satellite-based bramble detection as a proxy for mapping hedgehog habitats.
arstechnica.com
September 27, 2025 at 2:49 PM
Fun field trip today trying to validate a colleague's bramble detecting model: toao.com/blog/can-we-... with @anil.recoil.org
Can a model trained on satellite data really find brambles on the ground? - Sadiq Jaffer
toao.com
September 24, 2025 at 8:29 PM
Reposted by Sadiq Jaffer
Some fun OCaml GC projects here with @sadiq.toao.com and @kcsrk.info if any students are looking for projects involving programming languages toao.com/blog/ocaml-0...
Last three months in OCaml (July 2025) - Sadiq Jaffer
toao.com
July 15, 2025 at 10:04 AM
Reposted by Sadiq Jaffer
The most incredibly fun part of this Nature comment on evidence synthesis we published today is that the cartoonist (David Parkins) also did Beano and Dennis the Menace (!) A true legend. www.nature.com/articles/d41...
July 8, 2025 at 11:55 AM
Reposted by Sadiq Jaffer
The rapid rise in AI-generated fraudulent academic papers is "poisoning" scientific literature, say Cambridge researchers in Nature magazine today. But though AI is the problem, it could also help in ensuring the integrity of scientific discovery... buff.ly/AuSNcGd
@anil.recoil.org @sadiq.toao.com
July 8, 2025 at 11:25 AM
Reposted by Sadiq Jaffer
I'm pleased to announce OxCaml!

OxCaml is Jane Street's branch of OCaml. We've given it a new name and a snazzy logo, and done a bunch of work to make it easy for people to try.
June 13, 2025 at 2:14 PM
Reposted by Sadiq Jaffer
New paper out today on how the careful design of LLMs is crucial for expert-level evidence retrieval in conservation (but with implications for any evidence synthesis pipeline across other fields) 🌍 doi.org/10.1371/jour... and anil.recoil.org/news/2024-ce... for a summary
Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases
Wise use of evidence to support efficient conservation action is key to tackling biodiversity loss with limited time and resources. Evidence syntheses provide key recommendations for conservation deci...
doi.org
May 16, 2025 at 4:47 PM
Just how good are locally hostable code models on Cambridge first year OCaml assignments? @anil.recoil.org , @jon.recoil.org and I wanted to find out, so ran some tests. TL;DR Qwen3 means we might need new assignments. toao.com/blog/ocaml-l...
Qwen3 Leads the Pack: Evaluating how Local LLMs tackle First Year CS OCaml exercises - Sadiq Jaffer
toao.com
May 7, 2025 at 2:23 PM
If you are using llama.cpp, here's a workaround using grammars for getting JSON structured output from Deepseek R1 and distills: toao.com/blog/json-ou...
JSON output from Deepseek R1 and distills with llama.cpp - Sadiq Jaffer
toao.com
January 30, 2025 at 6:39 PM
Reposted by Sadiq Jaffer
Working to surface challenges faced by folks at the coal face.

Data in research contributions from @orbenamy.bsky.social @sadiq.toao.com @scotthosking.bsky.social Stefan Scholtes, Vasco Carvalho, Mireia Crispin and a foreward with Jess Montgomery @dianecoyle1859.bsky.social @ginasue.bsky.social
🚨 New report now live! 🚨

In partnership with @mctd.bsky.social & @bennettinstitute.bsky.social, our new report presents six case studies which show innovative uses of data for research in areas that are critically important to #science and #society.

⬇️ Read more
ai.cam.ac.uk/assets/uploa...
November 29, 2024 at 5:12 PM
Reposted by Sadiq Jaffer
New preprint from our work on using LLMs to accelerate conservation evidence synthesis across millions of papers. We crosscheck 3 retrieval strategies against 10 LLMs and benchmark against human experts and find quite a bit of variance https://www.researchsquare.com/article/rs-5409185/v1
November 16, 2024 at 10:42 AM