Aaron Tay
banner
aarontay.bsky.social
Aaron Tay
@aarontay.bsky.social
I'm librarian + blogger from Singapore Management University. Social media, bibliometrics, analytics, academic discovery tech.
Pinned
I'm an academic librarian blogger at Musings about librarianship since 2009.
To get a taste of what I blog, see "Best of..." Musings about librarianship - Posts on discovery, open access, bibliometrics, social media & more. Have you read them?
musingsaboutlibrarianship.blogspot.com/p/best-of.html
[Read] 2025 AI roundups open.substack.com/pub/simonw/p... vs open.substack.com/pub/sebastia... - mostly the same points
2025: The year in LLMs
Plus introducing gisthost.github.io
open.substack.com
January 3, 2026 at 4:26 AM
I first wrote abt perplexity in late 2022 but my use fell off as general LLMs like chatgpt, Claude, Gemini became great at adding search as a tool (o3 particularly was huge jump in capabilities). Trying it out again with perplexity pro.... its not bad but somehow the hallucination rate seems high
January 1, 2026 at 11:27 AM
This year on Bluesky I wrote 569 posts and 1,578 replies. I received 4,440 likes, whereas 68 was from my most popular post, and apparently I love saying "search" and ✅!

www.madebyolof.com/bluesky-wrap...
aarontay.bsky.social's Bluesky Wrapped 2025
Check out aarontay.bsky.social's year on Bluesky!
www.madebyolof.com
December 31, 2025 at 4:34 PM
One hour to 2026 here. This year I shifted my blog to substack and got a second wind hitting 32 posts! My highest output in recent years only exceeded in the first few years of my blog in 2009,2010 etc! Hope to keep this up!

For readers of my blog, thanks for the support and I will see you in 2026.
December 31, 2025 at 3:06 PM
[Last blog post of 2025] "AI-powered search" hides at least 4 different things—post-retrieval features, semantic search, LLM ranking, and synthesis. Your concerns about one may not apply to another. New post unpacking what's actually under the hood. aarontay.substack.com/p/what-do-we...
What Do We Actually Mean by "AI-Powered Search"?
When we say "AI-powered search engine," we're conflating at least four different things—and your concerns about one may not apply to another.
aarontay.substack.com
December 31, 2025 at 1:47 PM
Belatedly learning about the great split in the wikidata query service
The great split in the Wikidata graph has happened, separating scholarly articles from the rest of Wikidata. Hence apps querying articles in Wikidata need a rewrite. My little Wikidata browser "ALEC" has been updated to handle the graph split. So far things seem OK #sparql #wikidata #wikicite
December 31, 2025 at 1:14 PM
How do you know you have acquired a bit of understanding of a technical area? When you write something & you immediately can see that what you wrote is a bit of a simplification but it would take a lot more effort to address the technical point and it's not worth it usually .
December 30, 2025 at 2:35 AM
I must say im very impressed with Claude Opus 4.5. Its really seems smarter than Gemini 3.5 pro and chatgpt5.2+thinking (20 bucks tier). Part of it is the memory, chatgpt has that too but Claude is more specific. Beyond that Claude hedges a lot less than the other two when asked to critique
December 29, 2025 at 1:13 PM
So much Performative BS on linkedin. Eg someone not actually reading the paper and linking to it with a point that doesn't make sense.

Typical performative "engagement" with preprints based on title keywords rather than substance.
December 29, 2025 at 4:45 AM
I find it funny, that the anti-ai camp people often sound like stochastic parrots themselves, eg they will link to papers that are totally irrelevant (or the link is so tenuous if you are generous) to the point they are making , which you do see happen sometimes with RAG systems.
December 29, 2025 at 4:06 AM
Years ago a librarian (who was asked this question by a prof) asked me what's the difference between a database and search engine and even then I knew it was a tricky question. Part of it is what CS calls "database" or "search engine/index" is different from the LIS definition. (1)
December 24, 2025 at 2:27 PM
[blogged] Why Ghost References Still Haunt Us in 2025—And Why It's Not Just About LLMs aarontay.substack.com/p/why-ghost-...
Why Ghost References Still Haunt Us in 2025—And Why It's Not Just About LLMs
Ghost references existed long before LLMs. This post examines how Google Scholar's [CITATION] mechanism and web pollution may undermine RAG verification.
aarontay.substack.com
December 23, 2025 at 3:48 PM
I have a confession. Ever since the middle of this year? I realised the frontier LLMs, eg Claude Sonnet 4.5 became good enough to expand my ideas (eg from a social media thread) to a full blown post, I've been extraordinarily productive - im currently sitting on 3-5 almost completed posts...(1)
December 23, 2025 at 3:23 PM
[Read] Performance of AI tools in citing retracted literature preprints.jmir.org/preprint/88766 interesting paper but similar to others i have read but I'm willing to bet this issue will likely disappear or be so reduced to point not worth worrying in 2 years max. Why? (1)
Performance of AI tools in citing retracted literature, a pragmatic evaluation trial
Background: Artificial intelligence is increasingly used in scientific research to generate, refine, and summarize literature. Its ability to process large datasets promises greater efficiency...
preprints.jmir.org
December 23, 2025 at 2:30 PM
Reposted by Aaron Tay
Interesting… That affiliation data could be rebuilt with ORCID if that data is still getting through to Crossref.

@aarontay.bsky.social wrote about the disappearance of Elsevier abstracts from OpenAlex earlier this year aarontay.substack.com/p/the-petrol...
December 17, 2025 at 4:18 AM
A fallacy in my thinking. I assumed RAG properly done will reduce ghost ref to near zero. But this is not true if your source you do retrieval over can get contaminated which is easy if you just search the web! Elicit, Consensus searching indexes like Semantic Scholar should be less vulnerable
Eg of chatgpt instant saying a [Citation] Google Scholar entry exists by finding a paper reference chatgpt.com/share/69466f... . Here's a stronger model saying its likely a ghost reference chatgpt.com/share/694670...
ChatGPT - Education governance paper
Shared via ChatGPT
chatgpt.com
December 20, 2025 at 10:45 AM
Really confused by all these reports from librarians & bloggers about them getting a ton of fake citations (in the sense of paper doesnt exist at all). While I buy this being common in 2023, by 2025 this shouldn't be that common given even free versions can search the web.
December 20, 2025 at 8:51 AM
More thoughts about using LLMs to generate Boolean either manually or automated in system (eg Primo RA, Ebsco natural language search) why some librarians find it useful & some don't. Spoiler It has to do with use case & expectations (1)
December 17, 2025 at 7:33 PM
Preparing for perhaps my last talk of 2025
December 17, 2025 at 11:31 AM
[Blogged] Deep Research, Shallow Agency: What Academic Deep Research Can and Can't Do

open.substack.com/pub/aarontay...
Deep Research, Shallow Agency: What Academic Deep Research Can and Can't Do"
The Agentic Illusion: Most Academic Deep Research runs fixed workflows and stumble when given unfamiliar literature review tasks that do not fit them.
open.substack.com
December 16, 2025 at 4:27 PM
I really dont understand why whenever I see someone post a big fail using LLMs online. But when I try it I can't reproduce it.. Sometimes they are using a free weaker model but that happens enough it can't be just that. Or maybe im just lucky?
December 16, 2025 at 12:55 PM
Main point here is boolean + minimal relevance ranking with bm25 typically gives you mediocre precision@5/10 & theres not much you can do to improve. "AI search", matching that or even slightly worse is actually not a good result. Modern retrieval methods can not only give better precision@10 (1)
Insight: I've been quite negative about those simple "push natural language input to LLM to generate Boolean & run" systems than many librarians mostly because I have higher expectations than them.. (1)
December 16, 2025 at 12:19 PM
Insight: I've been quite negative about those simple "push natural language input to LLM to generate Boolean & run" systems than many librarians mostly because I have higher expectations than them.. (1)
December 16, 2025 at 11:31 AM
Fake doi-like string providers? www.doi.org/more-info. I initially expected this to be about LLMs making up dois.
December 15, 2025 at 11:36 AM