Pasquale Minervini
@neuralnoise.com
Researcher in ML/NLP at the University of Edinburgh (faculty at Informatics and EdinburghNLP), Co-Founder/CTO at www.miniml.ai, ELLIS (@ELLIS.eu) Scholar, Generative AI Lab (GAIL, https://gail.ed.ac.uk/) Fellow -- www.neuralnoise.com, he/they
Pinned
Pasquale Minervini
@neuralnoise.com
· Apr 1
Still ~8 days to apply for a postdoc position in multimodal foundation models at the University of Edinburgh! (@edinburgh-uni.bsky.social) -- Fully funded position until 2029 by the Generative AI Hub (@genaihub.bsky.social) to work with outstanding research teams! neuralnoise.com/2025/multimo...
Check out Yu Zhao's (@yuzhaouoe.bsky.social) latest work, “Learning GUI Grounding with Spatial Reasoning from Visual Feedback” (www.arxiv.org/abs/2509.21552), done during his internship at MSR (@msftresearch.bsky.social)!
New SOTA 🏆 results on ScreenSpot-v2 (+5.7%) and ScreenSpot-Pro (+110.8%)!
New SOTA 🏆 results on ScreenSpot-v2 (+5.7%) and ScreenSpot-Pro (+110.8%)!
October 17, 2025 at 9:23 AM
Check out Yu Zhao's (@yuzhaouoe.bsky.social) latest work, “Learning GUI Grounding with Spatial Reasoning from Visual Feedback” (www.arxiv.org/abs/2509.21552), done during his internship at MSR (@msftresearch.bsky.social)!
New SOTA 🏆 results on ScreenSpot-v2 (+5.7%) and ScreenSpot-Pro (+110.8%)!
New SOTA 🏆 results on ScreenSpot-v2 (+5.7%) and ScreenSpot-Pro (+110.8%)!
Reposted by Pasquale Minervini
trend: non-NVIDIA training
DeepSeek V3.1 was trained on Huawei Ascend NPUs
this one is a South Korean lab training on AMD
DeepSeek V3.1 was trained on Huawei Ascend NPUs
this one is a South Korean lab training on AMD
Motif 2.6B — compact model with long context
unique: trained on AMD GPUs
focus is on long context & low hallucination rate — imo this is a growing genre of LLM that enables new search patterns
huggingface.co/Motif-Techno...
unique: trained on AMD GPUs
focus is on long context & low hallucination rate — imo this is a growing genre of LLM that enables new search patterns
huggingface.co/Motif-Techno...
Motif-Technologies/Motif-2.6B · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
August 24, 2025 at 2:26 PM
trend: non-NVIDIA training
DeepSeek V3.1 was trained on Huawei Ascend NPUs
this one is a South Korean lab training on AMD
DeepSeek V3.1 was trained on Huawei Ascend NPUs
this one is a South Korean lab training on AMD
I really needed a Deep Research MCP server to use with Claude Code and other tools — here it is: github.com/pminervini/d...
August 23, 2025 at 1:58 PM
I really needed a Deep Research MCP server to use with Claude Code and other tools — here it is: github.com/pminervini/d...
Reposted by Pasquale Minervini
@togelius.bsky.social has thoughts on Genie 3 and games togelius.blogspot.com/2025/08/geni...
Fairly close to my own, though I didn't get the preview the tech.
Walking around a generated image-to-image world is not the same as playing a game. There are no game objectives.
Fairly close to my own, though I didn't get the preview the tech.
Walking around a generated image-to-image world is not the same as playing a game. There are no game objectives.
Genie 3 and the future of neural game engines
Google DeepMind just announced Genie 3 , their new promptable world model, which is another term for neural game engine. This is a big neura...
togelius.blogspot.com
August 5, 2025 at 8:13 PM
@togelius.bsky.social has thoughts on Genie 3 and games togelius.blogspot.com/2025/08/geni...
Fairly close to my own, though I didn't get the preview the tech.
Walking around a generated image-to-image world is not the same as playing a game. There are no game objectives.
Fairly close to my own, though I didn't get the preview the tech.
Walking around a generated image-to-image world is not the same as playing a game. There are no game objectives.
Reposted by Pasquale Minervini
quick diagram of Bluesky’s architecture and why it’s nicer here
August 2, 2025 at 11:19 PM
quick diagram of Bluesky’s architecture and why it’s nicer here
Reposted by Pasquale Minervini
Anthropic research identifies “inverse scaling in test-time compute,” where longer reasoning degrades AI performance. On certain tasks, models become more distracted by irrelevant data or overfit to spurious correlations.
#MLSky
#MLSky
Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber
Anthropic research reveals AI models perform worse with extended reasoning time, challenging industry assumptions about test-time compute scaling in enterprise deployments.
venturebeat.com
July 23, 2025 at 4:24 PM
Anthropic research identifies “inverse scaling in test-time compute,” where longer reasoning degrades AI performance. On certain tasks, models become more distracted by irrelevant data or overfit to spurious correlations.
#MLSky
#MLSky
Supermassive congrats to Giwon Hong (@giwonhong.bsky.social) for the amazing feat! 🙂
July 31, 2025 at 12:48 AM
Supermassive congrats to Giwon Hong (@giwonhong.bsky.social) for the amazing feat! 🙂
Reposted by Pasquale Minervini
Still not as bad as Microsoft Teams
Today in 1184 Henry VI of Germany was having a strategy meeting when the wooden second storey floor collapsed. Most of the courtiers fell through into the latrine cesspit below the ground floor, where more than 50 drowned in liquid excrement.
July 26, 2025 at 9:24 PM
Still not as bad as Microsoft Teams
The amazing folks at EdinburghNLP will be presenting a few papers at ACL 2025 (@aclmeeting.bsky.social); if you're in Vienna, touch base with them!
July 26, 2025 at 9:48 AM
The amazing folks at EdinburghNLP will be presenting a few papers at ACL 2025 (@aclmeeting.bsky.social); if you're in Vienna, touch base with them!
Reposted by Pasquale Minervini
Hm, hard disagree here. I really fail to see how this is misconduct akin to bribery, it's just a defense mechanism against bad reviewing practices. @neuralnoise.com
July 24, 2025 at 5:45 AM
Hm, hard disagree here. I really fail to see how this is misconduct akin to bribery, it's just a defense mechanism against bad reviewing practices. @neuralnoise.com
Reposted by Pasquale Minervini
🚨 New Paper 🚨
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵
June 13, 2025 at 4:15 PM
🚨 New Paper 🚨
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵
Reposted by Pasquale Minervini
Inverse scaling of reasoning models
a research collab demonstrated that there are certain types of tasks where all top reasoning models do WORSE the longer they think
things like getting distracted by irrelevant info, spurious correlations, etc.
www.arxiv.org/abs/2507.14417
a research collab demonstrated that there are certain types of tasks where all top reasoning models do WORSE the longer they think
things like getting distracted by irrelevant info, spurious correlations, etc.
www.arxiv.org/abs/2507.14417
July 22, 2025 at 8:01 PM
Inverse scaling of reasoning models
a research collab demonstrated that there are certain types of tasks where all top reasoning models do WORSE the longer they think
things like getting distracted by irrelevant info, spurious correlations, etc.
www.arxiv.org/abs/2507.14417
a research collab demonstrated that there are certain types of tasks where all top reasoning models do WORSE the longer they think
things like getting distracted by irrelevant info, spurious correlations, etc.
www.arxiv.org/abs/2507.14417
Reposted by Pasquale Minervini
Reasoning is about variable binding. It’s not about information retrieval. If a model cannot do variable binding, it is not good at grounded reasoning, and there’s evidence accruing that large scale can make LLMs worse at in-context grounded reasoning. 🧵
June 12, 2025 at 5:12 PM
Reasoning is about variable binding. It’s not about information retrieval. If a model cannot do variable binding, it is not good at grounded reasoning, and there’s evidence accruing that large scale can make LLMs worse at in-context grounded reasoning. 🧵
Hi @ilsebyl.bsky.social welcome to bsky! 🚀🚀🚀
July 22, 2025 at 11:02 AM
Hi @ilsebyl.bsky.social welcome to bsky! 🚀🚀🚀
Sometimes, too much reasoning can hurt model performance! New research by Anthropic (@anthropic.com), by Aryo Pradipta Gema (@aryopg.bsky.social) et al.: huggingface.co/papers/2507....
Paper page - Inverse Scaling in Test-Time Compute
Join the discussion on this paper page
huggingface.co
July 22, 2025 at 10:44 AM
Sometimes, too much reasoning can hurt model performance! New research by Anthropic (@anthropic.com), by Aryo Pradipta Gema (@aryopg.bsky.social) et al.: huggingface.co/papers/2507....
“LLMs can’t reason” 😅
Gemini with Deep Think officially achieves gold-medal standard at the IMO | Discussion
Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...
deepmind.google
July 21, 2025 at 9:52 PM
“LLMs can’t reason” 😅
Reposted by Pasquale Minervini
My "Math, Revealed" series is freely available to anyone -- no paywall! -- in the thread below.
July 4, 2025 at 12:07 AM
My "Math, Revealed" series is freely available to anyone -- no paywall! -- in the thread below.
Reposted by Pasquale Minervini
Spotlight poster coming soon at #ICML2025
@icmlconf.bsky.social!
📌East Exhibition Hall A-B E-1806
🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT
📜 arxiv.org/pdf/2410.12537
Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
@icmlconf.bsky.social!
📌East Exhibition Hall A-B E-1806
🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT
📜 arxiv.org/pdf/2410.12537
Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
July 10, 2025 at 9:00 AM
Spotlight poster coming soon at #ICML2025
@icmlconf.bsky.social!
📌East Exhibition Hall A-B E-1806
🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT
📜 arxiv.org/pdf/2410.12537
Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
@icmlconf.bsky.social!
📌East Exhibition Hall A-B E-1806
🗓️Wed 16 Jul 4:30 p.m. PDT — 7 p.m. PDT
📜 arxiv.org/pdf/2410.12537
Let’s chat! I’m always up for conversations about knowledge graphs, reasoning, neuro-symbolic AI, and benchmarking.
Reposted by Pasquale Minervini
This essay by Nisheeth Vishnoi is a thoughtful meditation on the nature of science and a rebuttal to the notion that AI systems are going replace human scientists anytime soon. Worth reading.
nisheethvishnoi.substack.com/p/what-count...
nisheethvishnoi.substack.com/p/what-count...
What Counts as Discovery?
Rethinking AI’s Place in Science
nisheethvishnoi.substack.com
July 5, 2025 at 4:23 PM
This essay by Nisheeth Vishnoi is a thoughtful meditation on the nature of science and a rebuttal to the notion that AI systems are going replace human scientists anytime soon. Worth reading.
nisheethvishnoi.substack.com/p/what-count...
nisheethvishnoi.substack.com/p/what-count...
"in 2025 we will have flying cars" 😂😂😂
July 5, 2025 at 4:17 PM
"in 2025 we will have flying cars" 😂😂😂
Reposted by Pasquale Minervini
Preprint alert 🎉 Introducing the Agentic eXplanations via Interrogative Simulations (AXIS) algo.
AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.
arxiv.org/pdf/2505.17801
AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.
arxiv.org/pdf/2505.17801
May 30, 2025 at 2:35 PM
Preprint alert 🎉 Introducing the Agentic eXplanations via Interrogative Simulations (AXIS) algo.
AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.
arxiv.org/pdf/2505.17801
AXIS integrates multi-agent simulators with LLMs by having the LLMs interrogate the simulator with counterfactual queries over multiple rounds for explaining agent behaviour.
arxiv.org/pdf/2505.17801
Reposted by Pasquale Minervini
'AI Safety for Everyone' is out now in @natmachintell.nature.com! Through an analysis of 383 papers, we find a rich landscape of methods that cover a much larger domain than mainstream notions of AI safety. Our takeaway: Epistemic inclusivity is important, the knowledge is there, we only need use it
April 17, 2025 at 2:44 PM
'AI Safety for Everyone' is out now in @natmachintell.nature.com! Through an analysis of 383 papers, we find a rich landscape of methods that cover a much larger domain than mainstream notions of AI safety. Our takeaway: Epistemic inclusivity is important, the knowledge is there, we only need use it
Reposted by Pasquale Minervini
Can you train a performant language model using only openly licensed text?
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
June 6, 2025 at 7:19 PM
Can you train a performant language model using only openly licensed text?
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
COLM (@colmweb.org) reviewers, please follow up on author responses if you need to! Most of the papers in my area chair batch didn't receive reviewer follow-ups, and it's dire
June 4, 2025 at 7:05 AM
COLM (@colmweb.org) reviewers, please follow up on author responses if you need to! Most of the papers in my area chair batch didn't receive reviewer follow-ups, and it's dire