Lightnews — Scholar-powered news

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Science & tech journalist, translator. Interested in all things algorithms, oceans, urban & the people involved.
https://storiesbyananya.wordpress.com

Posts Replies Media Videos

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

there are scattered copies of papers all over the internet, the training data of AI models aren't up-to-date.

Until companies improve how they filter retracted papers, users of AI tools should take steps to verify the tool's outputs.

www.technologyreview.com/2025/09/23/1...

AI models are using material from retracted scientific papers

Some companies are working to remedy the issue.

www.technologyreview.com

October 8, 2025 at 10:52 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Groups like @retractionwatch.com maintain a database of retracted papers, and some companies are starting to use these databases now to filter papers, to a certain extent. But, such measures are not foolproof—retraction databases are not comprehensive...

October 8, 2025 at 10:52 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Especially when people use AI tools to seek medical or health information, use of retracted papers can carry grave consequences.

The rate of retractions has been raising over the years, with more than 10,000 retractions in 2023 alone, according to a Nature analysis. bsky.app/profile/rich...

Richard Van Noorden @richvn.bsky.social · Dec 12

Milestone: 2023 is the first year with more than 10,000 research paper retractions -- smashing previous records. More than 8,000 of these came from Hindawi (mostly from 'special issues'). Total retractions now >50,000. My analysis for Nature.
www.nature.com/articles/d41...

October 8, 2025 at 10:52 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

There’s “kind of an agreement that retracted papers have been struck off the record of science and the people who are outside of science—they should be warned that these are retracted papers," Yuanxi Fu told me.

Yet, AI tools continue to use these papers to answer user questions.

October 8, 2025 at 10:52 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Thanks, Andrew! I will take a look at the paper

April 4, 2024 at 5:21 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

...catch everything.
The models and what goes into them, behind them, all of it is often hidden. Deemed proprietary knowledge by companies making these models.
But, w/o access to this information, it's hard to say what problems might exist, where exactly the biases arise and how best to squash them.

March 26, 2024 at 11:51 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Broadly, these models learn to make associations between images and their captions. But the captions themselves can contain incorrect, incomplete, biased, and, as
@abeba.bsky.social and team found, harmful content (which increased as the dataset size increased) . And automated filtering doesn't...

March 26, 2024 at 11:50 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

The models often rely on racist and sexist stereotypes to generate images, sometimes even amplifying bias. Stereotypes related to gender, skin colour, occupations, nationalities, geographies and more.

March 26, 2024 at 11:49 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

...we've sort of decided as a society that that's not what one should do and putting them in immediately forces a decision about stereotypes, she says. Yet, that's what we can ask these models to do.

March 26, 2024 at 11:49 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

One thing that stuck with me from my conversation with Ria is that concepts like poor or kind - words that can't be imaged from a societal perspective - can be fed in as much as any other word, like red or car. It's really harmful to put them into these models because...

March 26, 2024 at 11:49 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Most solutions - writing better prompts, adding sample images - are band aid fixes, and don't address the underlying problems in how these systems are built. There are many ways for these models to be biased that we haven't figured out yet. And there's certainly no way to automate safety.

March 26, 2024 at 11:48 AM

Ananya (ಅನನ್ಯ)

@punarpuli.bsky.social

Find the collection here: archive.org/details/Serv...

Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine

archive.org

October 28, 2023 at 10:26 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news