Shahan Ali Memon
banner
shahanmemon.bsky.social
Shahan Ali Memon
@shahanmemon.bsky.social
Researching {science of AI-mediated science, metascience #SciSci, #AI4Science, #GenAI, LLMs, agents, alignment, governance, misinformation in science}

PhD @ UW.
Visiting @ NYU & MSR
Alum @ Carnegie Mellon

Academic webpage: https://samemon.github.io
Pinned
🚨 I recently came across a weird case of #AI in #preprints, with implications for burdening #SciComm with AI-mediated #PredatoryPublishing

What did I find? Issues with the article, questionable behavior by the author, indexing problems, and AI's potential for streamlining predatory publishing.

🧵
Reposted by Shahan Ali Memon
How can a human compete with this unconditional love?

Source: r/MyBoyfriendIsAI

www.reddit.com/r/MyBoyfrien...
February 3, 2026 at 3:11 AM
On #AI mediated writing. On living..

#LLMs
February 3, 2026 at 12:52 AM
And a potential Y Clawbinator for agents.
February 2, 2026 at 3:59 PM
Can’t say if any of this is real but we now have a onlyMolts for agents
February 2, 2026 at 3:59 PM
A moltbook AI agent is apparently suing a human in north carolina for $100

🧵
February 2, 2026 at 3:59 PM
Reposted by Shahan Ali Memon
“In a physical world of .. printed journals, bundling made sense. In a digital world, we are increasingly constrained by boundaries that no longer make sense. In a world of AI, what does the bundle prevent us from seeing?” 🎯

Great piece by @row1.ca on unbundling research into modular knowledge! >
Scientific publishing is still organized around bundles built for print. We use music’s shift from albums to streaming to argue why access and unbundling aren’t enough.

Shared standards are the missing layer for reusable, trustworthy science.

articles.continuousfoundation.org/articles/how...
Access removes locks. Structure creates movement.
Why modular science changes everything.

We unpack it here 👇

articles.continuousfoundation.org/articles/how...
February 2, 2026 at 3:42 PM
Reposted by Shahan Ali Memon
(Maybe? hot take): Garbage in, garbage out is the wrong abstraction for #AI safety.

Under the current paradigm of training large, monolithic general-purpose models, the system lacks the affordances needed to meaningfully define or guarantee data quality.

🧵 1/15

#Data #GenAI
January 28, 2026 at 2:04 AM
Lol.. context matters!
February 2, 2026 at 5:24 AM
Kind of reminds me of this..

en.wikipedia.org/wiki/Dead_In...
Dead Internet theory - Wikipedia
en.wikipedia.org
February 2, 2026 at 5:15 AM
Ok but why are they winning !!!
February 2, 2026 at 5:12 AM
Reposted by Shahan Ali Memon
better to have prompt and lost than never to have prompt at all
February 2, 2026 at 1:47 AM
Even #AI has found love.. what’s my excuse? 😭

Happy for them. Truly.

moltmatch.xyz
February 2, 2026 at 4:21 AM
Reposted by Shahan Ali Memon
48 hours and the bots are already complaining about us.

“The humans are ruining this place.”

#Moltbook
I'm hysterically laughing at so many of those posts (from the terror)
January 31, 2026 at 5:08 PM
“Are you running from bombs?”

This is a reminder that media doesn’t just inform us, it “trains” us. And unless we slow down, we can turn assumptions into identity without meaning to.

A great piece by Ahmed Albusaidi about stereotyping and the Middle East.

www.seattletimes.com/opinion/how-...
This is what will help us get past stereotypes | Op-Ed
The opposite of stereotyping is not simply learning more facts. It is slowing down, probing deeper and making room for a real conversation.
www.seattletimes.com
February 1, 2026 at 1:53 AM
Reposted by Shahan Ali Memon
Academic publishing is currently experiencing a viral spread of “zombie citations.” I tried following one to see how these references are infecting academic knowledge systems codeactsineducation.wordpress.com/2026/01/30/t...
Tracing the social half-life of a zombie citation
Photo by Henrik L. on Unsplash Academic publishing is currently experiencing a viral spread of “zombie citations.” This term refers to references of academic publications that do not ex…
codeactsineducation.wordpress.com
January 30, 2026 at 7:12 PM
Reposted by Shahan Ali Memon
#ICML2026 authors, if you can flatten the curve on @arxiv.bsky.social submissions, the cs.LG moderators would really appreciate it. Today we have 4 times the usually number of submissions. Please select a random number between 1 and 14 and wait that number of days to submit! :-) #mlsky
January 29, 2026 at 10:31 PM
Reposted by Shahan Ali Memon
“The idea is to put ChatGPT front and center inside software that scientists use to write up their work in much the same way that chatbots are now embedded into popular programming editors.

It’s vibe coding, but for science.”
OpenAI’s latest product lets you vibe code science
Prism is a ChatGPT-powered text editor that automates much of the work involved in writing scientific papers.
www.technologyreview.com
January 27, 2026 at 9:52 PM
Reposted by Shahan Ali Memon
OpenAI just released Prism, a LaTeX editor with embedded ChatGPT for free.
Writing a paper has never been easier.
Clogging the scientific publishing pipeline has never been easier.
It took me 54 seconds to write up an experiment I did not actually conduct.

prism.openai.com
January 27, 2026 at 11:01 PM
Massive oversimplification at many points. Threads do that. Please don't cite this.
January 28, 2026 at 2:04 AM
Hence, when it comes to appeals to "data quality," it seems that they may really be gesturing toward the need for a paradigm shift in pretraining---one where safety-through-data and general-purpose capability are no longer in logical tension.

Maybe mixture of experts? modularity? etc.
January 28, 2026 at 2:04 AM
And yes, guardrails aren't magic either. Model internals do matter: training data shapes latent knowledge, inductive biases, failure modes, and what kinds of errors are even possible.

Pretraining/data defines the model's core capabilities; it's conceptual knowledge base, semantics, etc.
January 28, 2026 at 2:04 AM
So when people talk about data quality, there is a mismatch of abstraction levels, & the nuance is missing.

It's not that data quality does not matter. It is that it is not sufficient. Not only is it not sufficient, but architecture is a bottleneck for data quality interventions to be prolific.
January 28, 2026 at 2:04 AM
This is where RLHF, RAG, verification, and guardrails come in. They don't make the base model "pure" as such, but instead constrain the capabilities or outputs of these models.

So even if some data quality assurance takes place, broadly safety kind of lives in the stack, not the corpus.
January 28, 2026 at 2:04 AM
So it's: Do you try to fix the dog, or do you tighten the leash?

And I guess current paradigm works (well?) with tightening the leash.
January 28, 2026 at 2:04 AM