Justin Sheehy
banner
justinsheehy.hachyderm.io.ap.brid.gy
Justin Sheehy
@justinsheehy.hachyderm.io.ap.brid.gy
🌉 bridged from ⁂ https://hachyderm.io/@justinsheehy, follow @ap.brid.gy to interact
Reposted by Justin Sheehy
"I asked chatgpt and..." without wishing to be rude, I cannot now take anything else you say remotely seriously
November 8, 2025 at 12:52 PM
n.social/@grimalkina/115374730266829595" class="hover:underline text-blue-600 dark:text-sky-400 no-card-link" target="_blank" rel="noopener" data-link="bsky">https://mastodon.social/@grimalkina/115374730266829595

I appreciate Cat's work, especially when it contradicts the intuitive feelings of those of us in the business of making and operating software.

One @grimalkina doing actual research is worth a thousand thought leaders confidently […]
Original post on hachyderm.io
hachyderm.io
October 28, 2025 at 8:34 PM
Reposted by Justin Sheehy
Disruptive technologies ... disrupt by changing the way people do things.

And there are a lot of people who desperately want AI to be very disruptive.

But, I've never seen so much top-down pushing to try to make this happen.

Forcing people to use it at work. Adding it to applications.

If […]
Original post on sauropods.win
sauropods.win
October 2, 2025 at 12:17 PM
I just returned from a couple of weeks in Okinawa. I could go on and on about what an amazing and impactful time it was, and maybe some time I will. For now, I will just share this photo I took of Naminoue shrine.
September 27, 2025 at 5:26 PM
This is a nearly perfect example demonstrating how putting "a human in the loop" is in no way a solution to the problem that all LLMs produce incorrect outputs and cannot be prevented from doing so. Their objective is to produce "plausible text" and therefore preventing the user from noticing […]
Original post on hachyderm.io
hachyderm.io
August 15, 2025 at 7:49 PM
A plausible, scalable and slightly wrong black box: why large language models are a fascist technology that cannot be redeemed
When large language models (LLMs) get something factually wrong or make something ridiculous up, everyone makes fun of it online. Gemini told everyone to put glue on their pizza! (Hilarious!) A corporate chatbot invented a company policy that doesn’t exist! (Uh oh!) There’s gotta be about a million examples of LLMs spouting out nonsense that makes them look silly. Detractors use these as a “gotcha” for why LLMs aren’t ready for real-world use, and boosters defend them by saying that LLMs will only get better. LLM “hallucinations,” more appropriately known by the technical philosophical term, “bullshit” (text intended to persuade without regard for truth) is a well-known problem. LLMs bullshit or “hallucinate” because they do not actually produce an internal model of the problem being solved and they cannot reason toward a solution. LLMs are just statistical models that predict the next set of words to follow a prompt. They have been (fairly accurately) described as “spicy autocomplete” or “a fuzzy jpeg of the internet.” It works in the same way that your phone, for years, has been able to know that if you type “How is it,” the next word might be “going?”—just moreso. Because of this basic underlying architecture for LLMs, they are optimized for _plausibility_ , same as the autocorrect on your phone. They are not optimized for truth, or to connect to _anything_ in reality. It is only being trained to produce what are the next words likely to follow from the previous ones based on texts scraped from the internet. This is why we get bullshit/hallucinations, and there’s no way to ever stop them from doing that without completely scrapping the LLM project and rebuilding an artificial intelligence chatbot on a fundamentally different foundation. This foundation for LLMs also makes them into a “black box”—the statistical model that produces the text is so complicated, containing so many variables that there is no way to possibly explain definitively how it generated any answer to any prompt. If one wrote a regular expression to pull out all the numbers from a text, you could look at the original regular expression and find out that it missed an instance of “three” because it was only looking for numerals (i.e. “3”) and not letters. If you asked an LLM to pull all the numbers out of a text and it missed one, there is no way to ever know why, and even the “explanations” that newer generations of the models give are not real explanations, they are just more plausible but wrong text generated by the model. LLM boosters promise, of course, that the bullshit will be fixed in the future somehow. To date, sometimes more data points being fed into the model makes the problem better, sometimes it makes it worse. There has not been a clear trend line toward subsequent iterations of LLMs improving on this front. Boosters will always tell you, “This is the worst that LLMs will ever be, the technology will only get better in the future.” What they don’t specify is “better by what metric, better at what and better for whom?” As Google Search usability over the last decade has taught us, technology sometimes gets worse because that is more profitable for those who control it. In what follows, I will argue that _being plausible but slightly wrong and un-auditable—at scale—is the killer feature of LLMs_ , not a bug that will ever be meaningfully addressed, and this combination of properties makes it an essentially fascist technology. By “fascist” in this context, I mean that it is well suited to centralizing authority, eliminating checks on that authority and advancing an anti-science agenda. I will use the example case of medical systematic reviews to illustrate how it will be used to advance a fascist agenda and gesture toward a few other likely areas of fascist application. I will conclude by arguing that LLMs can’t “used for good,” accepted or even regulated but must be resisted and rejected wholesale. What LLM boosters and detractors both mostly miss is that a black box that returns a _slightly_ wrong but very plausible answer is a much better offering than being perfectly accurate for certain use cases. This is because there’s only one way to be perfectly accurate (providing the correct answer) but there’s a million ways to be slightly off (providing an answer that misses the mark, but is still mostly defensible). To paraphrase Tolstoy, “Accurate data retrieval is all alike; every LLM response is inaccurate in its own way.” And because LLM prompts can be repeated at industrial scales, an unscrupulous user can cherry-pick the plausible-but-slightly-wrong answers they return to favour their own agenda. It’s the scaling up of LLMs that makes its plausible black-boxed incorrectness so useful. If the LLM returns different and slightly incorrect answers depending on how one fine-tunes a prompt put to an LLM, then you can decide beforehand what answer you want from the aggregate analysis of a large corpus of data, and then have the LLM analyze it over and over until it gives you the answers you want. Because the model is a black box, no one can be expected to explain where the answer came from exactly, and because it can be applied at scale, there’s no possibility that it can be externally audited. To illustrate this, I will use the example of a systematic review in the medical literature (my area of expertise), although there are many other areas where this strategy can be used. In the area of insurance reimbursement, for example, an insurance company could decide the exact dollar amount they want to pay out, and then reverse engineer prompts to generate responses to thousands of applications, and fine-tune their responses until the justifications produced by the LLM in the aggregate match the amount of money they wish to pay. ## LLMs are the perfect technology for manipulating the medical literature to say nearly anything you want via systematic review methods Systematic reviews are an important part of the medical evidence hierarchy, sitting even above randomized clinical trials in their level of authority. For many medical questions that have been studied, there are sometimes multiple published clinical trials or other forms of evidence that can provide slightly different, or even conflicting answers. This is not because the methods used were flawed necessarily, but because human biology is complicated, and the answers to questions like “does drug A work better than B in population Y for condition Z?” are probabilistic ones like, “It works 50% better on this metric, 75% of the time” not categorical answers like “yes” or “no.” Systematic review methodology is meant to provide a broad overview of the medical literature on a specific subject, excluding low-quality evidence and statistically aggregating the more trustworthy evidence into an even more accurate and trustworthy estimate. They are “systematic” in the sense that they are meant to include all the evidence that has been produced to date on the question at hand. This is typically done by first performing a literature search of several medical databases to identify potential evidence sources, followed by human screening based on inclusion criteria applied to the title, abstract, followed by the full-text. This can be a work-intensive process, as selecting evidence has, prior to the advent of LLMs, required human judgement at this step. LLMs can be deployed here to automate screening of medical journal articles for inclusion in a systematic review, drastically reducing the human work required. This is a bad thing. Because this can be automated and the results of any LLM output are always slightly inaccurate and un-auditable but scalable, it can also be easily manipulated to return an answer of the reviewer’s choosing, and this intentionally introduced bias can be difficult or impossible to discern from the end result. The fact that this process can be automated allows an unscrupulous reviewer to try an arbitrary number of LLM prompts for screening criteria, repeating the screening process until the set of articles to be included only includes the articles that the reviewer wants. This can be fine tuned to the point where the bias is subtle, even when presented with the original LLM prompts. Similarly, LLMs can be deployed to extract data from medical journal articles, and because LLMs produce plausible answers (you could even measure and “validate” how well they perform against a “gold standard” of human data extractors) that are slightly wrong, they can be gamed to produce nearly any outcome in the aggregate in a manner that is very difficult or impossible to catch after the fact. ## Couldn’t this be happening to the systematic review literature already, even without LLMs? To be certain, an unscrupulous researcher can place their thumb on the scale at quite a number of points in the process of a systematic review, even without the use of LLMs. This happens deliberately or accidentally all the time, and as someone who has published several systematic reviews, and as someone who is often asked to do peer-review for this type of research, I am very cognizant of ways that researchers might be tempted to compromise their research integrity in order to get the answer they like. That said, LLMs present a new challenge because of the ability that they provide to perform many different fine-tuned iterations of a SR in a manner that can’t possibly be audited externally both because the LLM is a black box and because they can be scaled to the point where double-checking is impractical, and this can be done by a single person _without any scrutiny from other researchers_. Without an LLM, if a researcher wanted to redo data extraction, while making fine adjustments to the inclusion criteria or the data extraction protocol, if the set of evidence being considered was large enough, it would take a team of researchers a considerable amount of time to accomplish the task even once. Being asked to repeat it over and over with minor variations to the codebook would raise suspicions and likely even some push-back from a team of humans asked to do so. The cooperation required to accomplish large data extraction tasks without an LLM implied some level of accountability. It meant that even if a researcher is willing to commit this kind of research fraud and has the resources to do so, someone else involved is likely to put on the brakes somehow. This brings us to pinpoint why this technology isn’t just potential research fraud waiting to happen (although it is that too, and who are we kidding, it has definitely been used for research fraud already), but it’s also an essentially fascist tool: From the example of systematic review manipulation, it’s clear to see how it centralizes control over medical evidence synthesis by eliminating a large proportion of the people involved, and thus their ability to check the agenda of an unscrupulous central authority. This technology lends itself especially well to anti-science projects like the anti-vaccine movement, who could use this technology to inaccurately synthesize evidence from the medical literature to legitimize their movement. I will not be surprised when it is used to legitimize scientific racism and anti-queer hate. While I have focused on the dangers to medical information synthesis, I can think of several other ways this technique can also be applied in other industries. An insurance company, for example, can decide what level of payouts it wishes to have, and then adjust its justifications for decisions regarding claims at scale until it reaches them, regardless of the underlying validity of the claims themselves. Let the police or the army use this technology, and you can use your imagination on where they would go with it. ## What about “responsible” LLM use? “Using AI responsibly” certainly has the aesthetics of being a “reasonable middle ground,” away from “extreme” positions like banning, boycotting or abstaining from use. However, where fascism is concerned, being moderate toward it is not a virtue. I’m not going to say that every person who has used an LLM for any reason is a fascist, of course. There are many ways that a reviewer can try to safeguard their own LLM use against the kind of abuses I have described above. A researcher might attempt to thoroughly test the accuracy of an LLM at a data extraction task before employing it (good luck though, the black-box nature of AI’s tends to make this a somewhat fraught enterprise). A researcher attempting to use LLMs in good faith might also pre-register their study so that they can’t alter their prompts later and cherry-pick the result. Good for them! Unfortunately, even if _you_ as a researcher do everything you can to use AI “responsibly,” there is no way for _anyone else_ to distinguish your work from the irresponsible uses of AI. If you pre-registered a very detailed protocol for your systematic review before you did the work, there is no way for anyone else to know whether you already did your study before the pre-registration, except your own good word as a researcher. That’s the thing about fascist technologies—they are designed to remove accountability and centralize authority. This vitiates the whole point of doing the study in the first place. If it all comes down to “I didn’t cheat, trust me,” and there’s literally no way for anyone else to double-check, then I don’t know what this is, but it sure isn’t science anymore. ## What won’t help 1. First off, if you wish to do science in good faith, you absolutely cannot embrace LLMs for use in your own research. “But LLMs are here to stay, we better get used to them!” says the person who’s not on OpenAI’s payroll but inexplicably wants to do their PR work for them. Technologies are discarded, rejected or superseded all the time, even after they are touted as being “inevitable” or “here to stay so you better get used to it.” (Remember how cloning was “inevitable”? Remember how we all had to “just get used to NFTs because they’re not going anywhere?” Remember how the Metaverse was “here to stay?”) If you do embrace LLMs, congrats, your work is now indistinguishable from all the grifters and fascists. 2. Expecting bad-faith, mass-produced and then cherry-picked systematic reviews to be debunked after they are published is a losing proposition. The naive response, that the answer to bad speech is good speech, doesn’t fly here because we’re not just answering some instances of bad speech, we’re answering a machine that produces new bad speech on an industrial scale. Not just that, but we have to take into account Brandolini’s Law, the “bullshit asymmetry principle,” that the amount of energy needed to refute bullshit is an order of magnitude grater than the energy needed to produce it. Further, as we learned from Wakefield et al (1998), even if an incorrect medical idea is completely discredited, the paper is retracted, and the author is struck off the medical register for misconduct, the damage may already be permanently done. 3. A requirement from academic journals for pre-registration of research done by LLMs would be an ineffectual half-measure that is neither adhered to by researchers, nor enforced by journals, if the trends from clinical trial pre-registration continue. It’s just so easy to “cheat” and journal editors have a tendency to bend rules like these if there’s any wiggle room for it at all, especially if the journal article has an exciting story to tell. 4. There is absolutely no way that we can expect peer review to catch this sort of fraud. I have peer-reviewed so many systematic reviews and it is like pulling teeth to get anyone to pay attention to matters of basic research integrity. Ask a journal editor to insist that the data and analysis code for a study be made available, and see how it gets accepted without those. ## What will help 1. Stop using LLMs in your own research completely. It is making your work fundamentally untrustworthy for reasons I have outlined above. 2. Whenever you hear a colleague tout some brand-new study of the type I have described above, accomplished using an LLM, ask them about the kind of research fraud that’s possible and in fact very easy, as I have outlined here. Ask if they can provide any reason why anyone should believe that they didn’t do exactly that kind of fraud. If this seems too adversarial, keep in mind that this is the point of your job as an academic, and actual fraudsters, racists and anti-queer activists will and sometimes do hijack science for their own ends when no one asks the tough questions. 3. Recommend rejection for research accomplished with an LLM if you are asked to peer-review it, or if this is too work intensive, decline to review any research accomplished with an LLM for ethical reasons. 4. Under no circumstances should you include money for LLM use into your grant budgets. 5. If you are in a position of authority, such as being a journal editor, you need to use all the authority you have to draw a hard line on LLM use. There is no moderate or responsible way to use LLMs. They need to be rejected wholesale. ## I still think LLMs are cool and want to use them in my research If you are unconvinced by the above argument, there are many other reasons why you might still want to reject LLM use entirely. I won’t go into these in detail in this article, but: 1. LLM use makes you complicit in de facto racialized torture of the Kenyan workers who prepare the texts that are used as training data. 2. From hardware manufacturing to hyperscale data centre construction to the training process for LLMs, there is a massive negative environmental impact to LLM use. 3. LLMs and other forms of generative AI depend on training data that has, in many cases, been taken without consent or compensation from artists or other workers in an act that has been described as enclosure of the digital commons. 4. LLM use deskills you as an academic. 5. You will be left holding the bag when the LLM economic bubble bursts. The costs of producing and maintaining these models is not sustainable and eventually the speculative funding will run out. When the bubble bursts, you will have built your career on methods that no longer exist, and having put into the literature results that are completely non-reproducible.
blog.bgcarlisle.com
July 5, 2025 at 9:57 PM
Reposted by Justin Sheehy
I kinda hated writing this but I needed to do it.

Maybe now, finally, I can stop writing it in little fragments here and there, and just let it go and do something else.

https://blog.glyph.im/2025/06/i-think-im-done-thinking-about-genai-for-now.html
Deciphering Glyph :: I Think I’m Done Thinking About genAI For Now
Deciphering Glyph, the blog of Glyph Lefkowitz.
blog.glyph.im
June 5, 2025 at 5:29 AM
Good morning.
June 2, 2025 at 7:26 AM
Reposted by Justin Sheehy
Sometimes, people wrote to me to tell me that something i've written was helpful.

Lemme tell you, when shit is bringing me down over here, those emails make a world of difference.

Write to people. Connect. Let someone know it mattered.
May 4, 2025 at 8:21 PM
Sufficiently advanced autocomplete is indistinguishable from sentience.
May 2, 2025 at 4:43 PM
Reposted by Justin Sheehy
Trump's official denouncement of former CISA director Chris Krebs (in the form of a "Presidential Memorandum") is chilling in substance and utterly Stalinesque in tone. By threatening anyone who hires him, it aims to render Krebs effectively unemployable.

I said it then, and I will repeat it […]
Original post on federate.social
federate.social
April 11, 2025 at 9:32 PM
Reposted by Justin Sheehy
This is great news not in the least for our American friends where the weather service is being sabotaged. Weather models are oddly enough always global - you can't predict the weather in Berlin a week ahead without also predicting the weather in Austin, Texas. ECMWF has excellent hurricane […]
Original post on fosstodon.org
fosstodon.org
March 19, 2025 at 9:56 AM
Reposted by Justin Sheehy
I see generative AI on the web the same way I see forever chemicals in the real world. The effects may not be immediately apparent but they are poisoning mankind. Those who acquired knowledge and skills in the before era will be fine, but newer generations won't have the same reference points […]
Original post on mastodon.social
mastodon.social
March 5, 2025 at 10:15 PM
“I still believe that, five years from now, civilisation will be structured entirely around OpenBean beans,” said bean industry titan Sam Altbean. “I can’t offer any supporting evidence for that, but I do believe it.” […]
Original post on hachyderm.io
hachyderm.io
February 4, 2025 at 1:47 PM
Perhaps OpenAI should ask the NYT for advice on what to do about this sort of thing.
January 29, 2025 at 1:08 PM
I love this thread. The future that @ifixcoinops is talking about could exist now if people just choose to make it so.

Anyone who simply thinks “social media is bad” for individuals or the world has been tricked about what social media can be. Some companies have made things that are bad for […]
Original post on hachyderm.io
hachyderm.io
January 27, 2025 at 4:40 PM
“AI means futuristic and better. And this better future not only has extra fingers but is also illiterate.”

So much truth delivered in perfect style. McSweeney’s still has it.

https://www.mcsweeneys.net/articles/our-customers-demand-terrible-ai-systems
Our Customers Demand Terrible AI Systems
<p>We’ve been banging our heads against the wall, trying to think of the new “it” thing our customers want. At one point, somebody suggested improving our product, but then we thought of something better—something totally groundbreaking, something absolutely huge. We should implement a terrible AI system.</p> <p>Customers only care about one thing: barely functioning AI crammed into every facet of their lives. We all know this. That’s why anybody who’s anybody has slapped AI onto their product. AI phone, AI refrigerator, AI stick. If it doesn’t have AI, what the fuck are we even doing?</p> <p>Up until recently, the industry was pretty stagnant. We had been relying on outdated ideas like “ready for market,” “finished product,” and “works reasonably well.” We were all stuck selling products that broke only after a few years. Now, we’re finally innovating by cutting to the chase and selling products that are already broken.</p> <p>I know I’m personally tired of products pretending like they work for a bit, then getting hit with a surprise “This device is not compatible with the newest update,” or “You need a subscription service to continue using this product,” or “Nobody said it was waterproof.” Just give me my broken garbage and quit playing games.</p> <p>Before anybody states the obvious, I understand that terrible AI systems are technically terrible, but I’ll be damned if they&#8217;re not confident. Like yeah, it may be wrong when I ask it a question, but it <i>sounds</i> right. What naysayers don’t seem to realize is that people respond better to a confident liar shouting out answers than a polite expert. Women can back me up on this.</p> <p>Plus, isn’t it kind of exciting? For instance, maybe you looked up how to change your oil, and now your brakes don’t work. Doesn’t that sound fun? Just living in a constant state of fear that all of the information you have access to is anywhere from a little to very wrong?</p> <p>The only people who don’t like terrible AI don’t understand it. So, let me break it down for you. The “A” stands for &#8220;artificial,&#8221; and the “I” also stands for something, but I can never remember what it is. I want to say the internet? I just checked with AI, and it confirmed it stands for &#8220;artificial internet.&#8221; Anyway, AI means futuristic and better. And this better future not only has extra fingers but is also illiterate.</p> <p>If you’re still not convinced, it’s probably because you think AI is a feature that adds value. That’s all wrong. It’s not a feature; it’s more of an idea. Obviously, everyone can tell that AI isn’t working correctly, but you can see what it’s going for, and wouldn’t it be pretty amazing if it <i>did</i> work? Man, that would be pretty impressive. The value added is a feeling that you’re on the cutting edge of technology even though you’re using a product that is now actively worse.</p> <p>And if we’re not ready to take the plunge into AI, we at least need to call one of our preexisting systems &#8220;AI.&#8221; Lots of companies do this. Just rename things to AI. Chatbot: AI. Automated voice messaging system: AI. Outsourced customer service rep in Myanmar: AI. If companies aren’t even pretending to have AI, they will lose their customers’ trust.</p> <p>At the end of the day, we have to add AI to our product. We don’t really have a choice. It’s best to think of terrible AI as if it’s your deadbeat brother-in-law. Sure, nobody asked Michael to stay on your couch for six months, but there he is, making bad music, hallucinating, and teaching the kids how to make bombs. He’s forced his way into your home in the same way AI has forced itself into our products. There’s really no use in fighting it, because, like it or not, artificial internet isn’t going anywhere anytime soon.</p>
www.mcsweeneys.net
January 26, 2025 at 7:24 PM
Reposted by Justin Sheehy
https://www.theatlantic.com/magazine/archive/2025/01/poet-seamus-heaney-berkeley/680757/

"walk on air
against your better judgement"

a very beautiful article, especially if listening to the telling as you read

thanks to @catherinecronin for the link
Walk on Air Against Your Better Judgment
What Seamus Heaney gave me
www.theatlantic.com
December 16, 2024 at 2:12 AM
This blog post by @ludicity is so damn good. Every paragraph is bright, sharp, and pointy. It was great when I read it six months ago, and it was still great, perhaps even more so, when I came across it again today […]
Original post on hachyderm.io
hachyderm.io
January 21, 2025 at 9:34 PM
The world got a bit darker with the loss of @rit who was a good man. Each time our paths crossed I was better for it. Brendan will be deeply missed. Today I cry for him.

https://mastodon.social/@armurray/113827613111941167
Adam [ 🎹 🥘 🧑‍💻 🏒 ♟️ 💞 ] (@armurray@mastodon.social)
Attached: 1 image With profound sadness, I share that we've lost our good friend and colleague Brendan McAdams (@rit@hachyderm.io ). A brilliant programmer and cherished member of our community, Brendan was a core contributor to Perl and someone who stood firmly by his principles throughout his career. Please boost so others may know his name and legacy. (1/8)
mastodon.social
January 16, 2025 at 1:59 PM
Reposted by Justin Sheehy
For those of you who weren't at #monktoberfest last month, here's my talk about how people with no stake in the game taught Open Source users to be assholes and maintainers to not know how or who to ask for help, with almost predictably disastrous consequences […]
Original post on social.coop
social.coop
November 26, 2024 at 6:05 PM
Thanks to some nice work by @renato and other excellent folks at @infoq my keynote talk "Being a Responsible Developer in the Age of AI Hype" from DevSummit Boston (https://www.infoq.com/presentations/responsible-development-ai-hype/) is now also available in article form […]
Original post on hachyderm.io
hachyderm.io
November 10, 2024 at 2:24 AM
Microsoft is certainly not the only place where I've seen "Growth Mindset" terminology weaponized, but that isn't even close to the worst part of https://www.wheresyoured.at/the-cult-of-microsoft/
November 5, 2024 at 1:25 PM
Melanie Mitchell has the best round up and summary of the current state of the “LLM Reasoning Debate” that I’ve seen so far, so you should read this post and the linked papers if you care to have a position on that topic.

https://aiguide.substack.com/p/the-llm-reasoning-debate-heats-up
October 24, 2024 at 2:44 PM