Behind Every Discovery Lies a Question: How UnScientify Maps the Invisible Web of Scientific Uncertainty
_The text explains that uncertainty is a fundamental part of scientific writing, yet difficult to detect automatically. The new system “UnScientify”, developed by researchers from France and Germany, uses transparent, rule-based methods to identify and categorize different forms of scientific uncertainty. It even distinguishes whether the uncertainty is expressed by the authors themselves or by others. In benchmark tests, UnScientify outperformed advanced AI models like GPT-4 in both accuracy and explainability. The tool aims to help science and society better identify uncertainty within research texts, promoting greater transparency and more effective knowledge transfer._
_Der Text beschreibt, wie Unsicherheit ein integraler Bestandteil wissenschaftlicher Texte ist und bisher schwer automatisch erfasst werden konnte. Das neue System “UnScientify”, entwickelt von Forschern in Frankreich und Deutschland, erkennt und kategorisiert verschiedene Ausdrucksformen wissenschaftlicher Unsicherheit durch regelbasierte, transparente Methoden. Es unterscheidet dabei sogar, wer die Unsicherheit äußert – die Autoren selbst oder andere Forschende. In Vergleichstests schlägt UnScientify fortschrittliche KI-Modelle wie GPT-4 in der Zuverlässigkeit und Nachvollziehbarkeit seiner Ergebnisse. Das System soll Wissenschaft und Gesellschaft helfen, Unsicherheiten in Forschungstexten besser zu erkennen, was zu mehr Transparenz und besserem Wissenstransfer führt._
DOI: 10.34879/gesisblog.2025.106
* * *
### The Hidden Patterns of Doubt – And Why They Drive Real Progress
Open an academic journal, and you’ll find more than facts and figures. There are also subtle pauses, careful “maybes,” even blunt admissions of ignorance—woven seamlessly into the text. In science, uncertainty is not a weakness; it’s a driving force. Yet until recently, the art of tracing and understanding these signals of doubt remained impossible for machines—and tricky even for humans. Now, researchers at the Université Marie et Louis Pasteur in France and GESIS – Leibniz Institute for the Social Sciences in Germany are changing that, putting uncertainty itself under the microscope.
### Setting the Scene: Reading between the lines of scientific uncertainty
Picture a university café on a rainy afternoon. Groups of scholars argue animatedly over results and methods, forever balancing certainty and speculation. Their articles reflect this same reality. Scientific texts are not just collections of data and conclusions—they are rich landscapes of cautious optimism, measured skepticism, and, crucially, hundreds of ways to admit, “We don’t fully know.”
In this world, words like “possible,” “remains unclear,” “we hypothesize,” or “the evidence is inconclusive” become as significant as any hard number or equation. Reading between the lines is not a luxury, but a necessity: only by identifying the contours of certainty and uncertainty can science progress authentically.
### Why Does Mapping Uncertainty Matter?
Uncertainty marks the edges of knowledge. When researchers hedge their statements or highlight the limits of a finding, they are—often unwittingly—placing flags for the next expedition into the unknown. These signals are crucial for other scientists, for policymakers sifting through complex reports, and for anyone seeking to distinguish established facts from open questions. But language is slippery, and context is everything. A “may” can express a minor doubt or substantial knowledge gap depending on where and how it’s used.
Traditional computational tools have often failed here. Even the most advanced AI struggles to consistently recognize the nuanced signals of scientific doubt, and the reasons behind AI decisions often remain opaque—a “black box” that few are comfortable trusting in critical contexts.
### UnScientify: Letting Machines Read Between the Lines
Enter UnScientify. Instead of treating language as a bag of words or pouring everything into a deep-learning engine, this system adopts a rule-based, transparent approach. Drawing from 12 distinct patterns of uncertainty—from explicit statements (“remains unresolved”), to modal verbs (“might affect”), conditional reasoning (“if x, then possibly y”), indirect questions, and even scholarly disagreement—UnScientify annotates scientific texts sentence by sentence. Crucially, it doesn’t stop there: it also discerns who is expressing the uncertainty. Is this the author’s personal doubt, or a recitation of others’ skepticism?
This matters enormously. Imagine two sentences: “We believe further research is needed,” and “Some studies suggest the results are unreliable.” They both communicate limits, but the source (and thus the weight) of uncertainty is different. UnScientify uses linguistic analysis (with tools such as spaCy and custom pattern-matching) to make these distinctions clear, providing not just a label but a rationale for each annotation.
### When Human-Designed Rules Beat Super AIs
What good is all this pattern-hunting? To test UnScientify, the team pitted it against the most sophisticated AI models of our time: GPT-4, RoBERTa, SciBERT and more, using a carefully constructed, multidisciplinary dataset of almost 1,000 annotated sentences. UnScientify came out ahead, scoring an impressive 80.8% accuracy—surpassing all tested machine-learning and large language models.
But the story doesn’t end at raw scores. UnScientify’s real triumph is its explainability. Where “black-box” AIs often flip-flop or leave humans guessing about their decisions, UnScientify shows its work: every detection is justified by its language rulebook. If a new kind of uncertainty emerges, domain experts can update the patterns—no retraining on enormous datasets required.
### Unlocking Better Communication in Science and Society
Imagine a future where, with a click, every sentence carrying academic doubt is highlighted in a research paper; where policymakers can instantly see which findings are robust and which rest on shaky ground; where journalists and educators can model scientific caution with ease. By charting uncertainty, UnScientify offers a tool not just for text mining, but for transparency, trust, and better conversation between science, decision-makers, and the public.
The implications ripple outwards. Science does not advance in leaps of absolute certainty, but in careful steps—each one marked by open questions. UnScientify stands to help make these questions visible, opening new paths for interdisciplinary research and genuine dialogue.
### Looking Further: Charting New Territories of Doubt
While UnScientify already covers vital ground, the journey has only begun. The team aims to broaden its understanding, expand its training to new fields, and perhaps even hybridize linguistic clarity with the deep, abstruse capacities of modern AI. The project’s publicly available dataset is already fueling further research, encouraging scientists everywhere to refine the art of charting uncertainty.
As we look to a future brimming with data—and questions—the real heroes may not be those who declare the answers, but those who help reveal the borders between what is known and what is still waiting to be explored. UnScientify is helping science learn to read its own hesitations—and in those hesitations, to find the seeds of its next revolutions.
Original scientific publication:
**Panggih Kusuma Ningrum, Philipp Mayr, Nina Smirnova, Iana Atanassova: Annotating scientific uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches, Journal of Informetrics, Volume 19, Issue 2, 2025:
****https://doi.org/10.1016/j.joi.2025.101661**
This article was written by Christian Kolle with the support of ChatGPT 4.1 based on the original scientific publication and reviewed by one of the researchers involved, Dr. Philipp Mayr.