kawillis.bsky.social
@kawillis.bsky.social
and no matter what you find a typo 3 months later
December 20, 2025 at 2:29 PM
Yes! A big part of the reason our approach to predictions works as well as it does is that we evaluate each article on its own merits - ✨not✨based on its journal of publication.

Don’t judge a book by its cover.
December 19, 2025 at 2:04 PM
Actually knock wood I think we’ll be ok - because our method leverages the behavior of human experts at scale. We’re not dependent on LLMs at all. Those confabulations will lump together and be excluded from CCNs of serious scholarship - bit like oil and water.
December 18, 2025 at 11:38 PM
Huge thanks to all my co-authors, who constantly impress me with their brilliance and insight. I am truly humbled by the hard work that went into making this paper a reality. I look forward to the next part of the story! 15/15
December 18, 2025 at 8:49 PM
This work is only the beginning; lots of questions remain. For example, what happens before the appearance of a breakthrough signal? Do funding levels, or the number of practicioners in a field, influence the lag between signal and breakthrough? 14/15
December 18, 2025 at 8:49 PM
The precise nature of the key insight(s) that leads to a breakthrough, as well as the circumstances surrounding it, are inherently difficult and maybe impossible to predict. 13/15
December 18, 2025 at 8:48 PM
The predictability of breakthroughs may seem at odds with the commonly accepted role of serendipity in discovery; but it’s not. The behavior of scientists, who flock to areas they believe are likely to produce a breakthrough, generates a detectable predictive signal. 12/15
December 18, 2025 at 8:48 PM
We found an additional 19 signals when we analyzed the data for 2014-2017. To learn more about those predicted breakthroughs, one of which has already won a Lasker award, read the preprint! 11/15

www.biorxiv.org/content/10.6...
Prediction of transformative breakthroughs in biomedical research
The ability to predict scientific breakthroughs at scale would accelerate the pace of discovery and improve the efficiency of research investments. Recent advances in artificial intelligence, graph th...
www.biorxiv.org
December 18, 2025 at 8:47 PM
Another signal from this period (1994-1997) correlates with the development of tools that improve clinical care for HIV patients, which is notable because behavioral and social science advances are less likely to be recognized with major prizes 10/15
December 18, 2025 at 8:46 PM
Of the 18 signals that appeared between 1994 and 1997, 14 have since won major recognition (Nobel, Lasker, or similar); another 2 correspond to therapies now in clinical trials for neurovascular disorders or type 2 neurofibromatosis 9/15
December 18, 2025 at 8:45 PM
Importantly, all the components of our breakthrough signal measure the properties of individual articles, not the journals in which those articles are published. We don’t want to skew our results by pre-judging where a breakthrough might appear; every paper has a chance! 8/15
December 18, 2025 at 8:44 PM
We then applied machine learning to identify features common to all our gold standard trajectories: a burst of papers in a new, rapidly evolving (aka low cohesion) topic, many of which quickly become highly influential 7/15
December 18, 2025 at 8:44 PM
Each historical CCN is like a time capsule: they contain only data that would have existed if time stopped in that year. We strung together different years, following the trajectory of our prize-winning gold standards as the topic evolved and gained recognition. 6/15
December 18, 2025 at 8:43 PM
Then we zeroed in on recognized breakthroughs: what did prize-winning topics look like before they won prizes? We needed to find out without leaking information from the future, so we built another 36 historical CCNs 5/15
December 18, 2025 at 8:41 PM
The first step in answering our ambitious question was to agnostically and systematically define every possible research topic by applying regularized Markov clustering to the co-citation network (CCN) of all of PubMed – that’s more than 18 million papers! 4/15
December 18, 2025 at 8:40 PM
Scientists toiling away, struggling to secure funding or convince their colleagues, then decades later, a Nobel prize. Warren and Marshall, proving that bacteria, not stress, cause ulcers, or Kariko and Weissman, demonstrating that RNA-based vaccines are feasible. 3/15
December 18, 2025 at 8:40 PM
This paper started with an ambitious question: how can scientists deliver improvements in human #health more quickly and efficiently? What came to mind immediately was a familiar pain point: delayed recognition. Most people have heard the extreme version of this story 2/15
December 18, 2025 at 8:39 PM
Reposted
I'm reminded of that recent post from a grant admin who had a PI who used LLMs to generate a checklist and the checklist had the wrong due date.
December 18, 2025 at 4:42 PM
I really like this frame, with one teeny tiny nitpick: I’m not a huge fan of using AI as a synonym for generative LLMs, which I think is what you mean here (apologies if I misunderstand). A different kind of AI might be perceived differently.
December 17, 2025 at 11:16 PM