Ananya (ಅನನ್ಯ)
punarpuli.bsky.social
Ananya (ಅನನ್ಯ)
@punarpuli.bsky.social
Science & tech journalist, translator. Interested in all things algorithms, oceans, urban & the people involved.
https://storiesbyananya.wordpress.com
there are scattered copies of papers all over the internet, the training data of AI models aren't up-to-date.

Until companies improve how they filter retracted papers, users of AI tools should take steps to verify the tool's outputs.

www.technologyreview.com/2025/09/23/1...
AI models are using material from retracted scientific papers
Some companies are working to remedy the issue.
www.technologyreview.com
October 8, 2025 at 10:52 AM
Groups like @retractionwatch.com maintain a database of retracted papers, and some companies are starting to use these databases now to filter papers, to a certain extent. But, such measures are not foolproof—retraction databases are not comprehensive...
October 8, 2025 at 10:52 AM
Especially when people use AI tools to seek medical or health information, use of retracted papers can carry grave consequences.

The rate of retractions has been raising over the years, with more than 10,000 retractions in 2023 alone, according to a Nature analysis. bsky.app/profile/rich...
Milestone: 2023 is the first year with more than 10,000 research paper retractions -- smashing previous records. More than 8,000 of these came from Hindawi (mostly from 'special issues'). Total retractions now >50,000. My analysis for Nature.
www.nature.com/articles/d41...
October 8, 2025 at 10:52 AM
There’s “kind of an agreement that retracted papers have been struck off the record of science and the people who are outside of science—they should be warned that these are retracted papers," Yuanxi Fu told me.

Yet, AI tools continue to use these papers to answer user questions.
October 8, 2025 at 10:52 AM
Thanks, Andrew! I will take a look at the paper
April 4, 2024 at 5:21 AM
...catch everything.
The models and what goes into them, behind them, all of it is often hidden. Deemed proprietary knowledge by companies making these models.
But, w/o access to this information, it's hard to say what problems might exist, where exactly the biases arise and how best to squash them.
March 26, 2024 at 11:51 AM
Broadly, these models learn to make associations between images and their captions. But the captions themselves can contain incorrect, incomplete, biased, and, as
@abeba.bsky.social and team found, harmful content (which increased as the dataset size increased) . And automated filtering doesn't...
March 26, 2024 at 11:50 AM
The models often rely on racist and sexist stereotypes to generate images, sometimes even amplifying bias. Stereotypes related to gender, skin colour, occupations, nationalities, geographies and more.
March 26, 2024 at 11:49 AM
...we've sort of decided as a society that that's not what one should do and putting them in immediately forces a decision about stereotypes, she says. Yet, that's what we can ask these models to do.
March 26, 2024 at 11:49 AM
One thing that stuck with me from my conversation with Ria is that concepts like poor or kind - words that can't be imaged from a societal perspective - can be fed in as much as any other word, like red or car. It's really harmful to put them into these models because...
March 26, 2024 at 11:49 AM
Most solutions - writing better prompts, adding sample images - are band aid fixes, and don't address the underlying problems in how these systems are built. There are many ways for these models to be biased that we haven't figured out yet. And there's certainly no way to automate safety.
March 26, 2024 at 11:48 AM
Find the collection here: archive.org/details/Serv...
Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine
archive.org
October 28, 2023 at 10:26 AM