Sebastian Karcher
banner
adam42smith.bsky.social
Sebastian Karcher
@adam42smith.bsky.social

Director, Qualitative Data Repository (personal account).
Data, Zotero, Social Science Methods
https://sebastiankarcher.com

Political science 25%
Computer science 23%
A truly gonzo Executive Order from the Trump Administration removes US from not just the UN Framework Convention on Climate change, but also the IPCC (!), the IUCN, the IPBES (the IPCC of biodiversity), and all sorts of other organizations. www.whitehouse.gov/presidential...
Withdrawing the United States from International Organizations, Conventions, and Treaties that Are Contrary to the Interests of the United States
MEMORANDUM FOR THE HEADS OF EXECUTIVE DEPARTMENTS AND AGENCIES By the authority vested in me as President by the Constitution and the laws of the United
www.whitehouse.gov

ChatGPT is obv a bad for this.
Here's the thing, though: It's pretty hard to get good results for this type of q from traditional databases. Try it in Pubmed!
Otoh, using the right LLM-based tool, e.g. Elicit, produces good results with a natural language query (tho I'd worry about the summaries):

That "Why?" somehow hits different for "Let's draw some causal graphs"...

Characteristically excellent from Robin. Some of these (like 2&5) are things FLOSS/open infra folks have been saying for a while; might be worthwhile considering reasons for slow progress.
Others (esp 4) are new to me & likely more controversial
I'd like to see some coordination in the open source community to respond to this. The outcome could shape significant funding and strategic moves.

Off the top of my head, there's a few things that I'd like to see in a new European open source strategy. 🧵
The @ec.europa.eu will soon present “a combination of funding and policy measures” to support European open source projects in becoming commercially viable alternatives to proprietary US tech.

Read my article at @euractiv.com:

www.euractiv.com/news/commiss...

Reposted by Sebastian Karcher

When this guy wrote his survey of European polls in 1963, he had no idea that he would be thrilling an archivist in 2026. But these charts warm the cockles of my heart.
I'd like to see some coordination in the open source community to respond to this. The outcome could shape significant funding and strategic moves.

Off the top of my head, there's a few things that I'd like to see in a new European open source strategy. 🧵

Reposted by Sebastian Karcher

This is traditional!

This is a delightful story as well as impressive science & I love @carlzimmer.com for including this bit (gift link):

www.nytimes.com/2026/01/01/s...

I do know people who take their car to the race track, so this isn't completely out of the question, but probably something that could be addressed.

The guys really, really love their calipers...
"The application...asks users for their height, ancestral background, SAT score and their feelings about entrepreneurs....The system can assess their cheekbone prominence, jaw strength or body-fat percentage from a scanned photo, and analyze their application to estimate an IQ score."

Same here. I was surprised that's the direction in which they went (didn't make it to the actual ideas, survey link timed out).
Is this different in other fields, perhaps? Like medical researchers don't know what to try next?

Reposted by Sebastian Karcher

"The application...asks users for their height, ancestral background, SAT score and their feelings about entrepreneurs....The system can assess their cheekbone prominence, jaw strength or body-fat percentage from a scanned photo, and analyze their application to estimate an IQ score."

Yeah, it's very easy to add something to (potential) training data in the chat interfaces. How/how easy to opt out varies by tool (not sure it's even possible for all free ones). Enterprise versions (i.e. a tool your institution provides access to) are typically opted out by default, as are APIs.

It does say "custom AI models" which strongly implies that, but it's one thing they should, certainly in retrospect, have made clearer -- writing emails like this is super tricky bc they can't be too long/complicated.

The boring answer is stand mixer & espresso machine.
The two w the most unexpectedly large impact are pasta roller attachment for KitchenAid & digital kitchen scale
Everybody has two kitchen appliances that will change their life. The problem is, they're different for everybody.
This place needs some Innocuous Discourse pronto. Quote this with a take that’s not political or aggressive

Since several people have complained to the IRB, I'd assume that's the angle this gets resolved under. FWIW, I'm quite convinced that there, too, the researchers are easily in the green legally.
Beyond legality, I find the ethical case against this work weak & problematic, but worth discussing.

I think that's right for fair use - as I say, the boundaries are blurry. But I don't think what the researchers here did even requires fair use.
Analyzing a paper with a computer model doesn't qualify as "reproduce", "prepare derivative work of," or "distribute" (the uses protected by copyright)

I _love_ that you're doing this

#datalibs smoke signal!!!
Please share - @pewresearch.org wants to hire a data archivist who will be an advocate for data users, helping to ensure that our datasets are easy to discover and reuse by researchers, journalists, and the public.
pewtrusts.wd5.myworkdayjobs.com/CenterExtern...
Please share - @pewresearch.org wants to hire a data archivist who will be an advocate for data users, helping to ensure that our datasets are easy to discover and reuse by researchers, journalists, and the public.
pewtrusts.wd5.myworkdayjobs.com/CenterExtern...

I mean -- copyright licenses are, in fact, useless to ban things that copyright law either doesn't cover (like algorithmic analysis) or exempts (like fair use). And again, that applies equally to preprints and gated papers.

My disagreement is that it's not their right. As a copyright holder, you don't have the right to prohibit algorithmic analysis of your work, which is all that happened here.

They likely meant to be extra safe & exclude works that weren't explicitly licensed for re-use & screwed that part up.

Fair use generally applies to gated publishing, too, as long as you obtain the papers legally (that's why Anthropic readily paid to libgen authors). The case for fair use is marginally weaker, but not really.

I don't know what you think that shows? I understand some people didn't have permissive (e.g. CC) licenses on their preprints. It might have been better had those papers be skipped & the mention of a license in the email is confusing.
Fair use still applies and you don't need to 'claim' it.

That doesn't make sense. Fair use is an exemption to copyright (i.e., allows the use of copyrighted work w/o prior permission) and thus by definition cannot be prohibited by or violate a copyright license.
(And analyzing a paper algorithmically doesn't touch upon copyright at all).

3/3 What bothers me about this whole thing is that folks hate LLMs so much that they readily throw overboard all concerns about regulatory overreach into research via IRBs or hampering research via copyright. There are long, long literatures on both of these things!

2/3 If they trained an LLM on it (it doesn't sound like they did), they're on much weaker ground ethically, but legally, given it's non-profit research and uses papers shared at no cost, the Fair Use case would be strong in the US and TDM exceptions likely apply elsewhere.

IANAL, but depending on the details, I don't even think the license matters. If all they did was use an existing LLM to generate "research ideas" based on a legally obtained paper, that's just not affected by copyright at all. Whether you use Acrobat to read it or an LLM makes no difference 1/3

a) not by federal standards: if I publicly disseminate my analysis criticizing someones work, that almost certainly meets standards for "research" (just not for human subjects).
b) lots of papers re-analyze data for other purposes than audit, so if you don't like the Gino example pick a diff one