Johannes Wachs
@johanneswachs.bsky.social
Researching social computing, crowds, and networks at Corvinus University of Budapest and HUN-REN CERS.
More at: https://johanneswachs.com/
More at: https://johanneswachs.com/
Thanks to my coauthors for an interesting collaboration. Here's the preprint again:
arxiv.org/abs/2506.08945
Comments welcome!
arxiv.org/abs/2506.08945
Comments welcome!
Who is using AI to code? Global diffusion and impact of generative AI
Generative coding tools promise big productivity gains, but uneven uptake could widen skill and income gaps. We train a neural classifier to spot AI-generated Python functions in 80 million GitHub com...
arxiv.org
June 11, 2025 at 8:23 PM
Thanks to my coauthors for an interesting collaboration. Here's the preprint again:
arxiv.org/abs/2506.08945
Comments welcome!
arxiv.org/abs/2506.08945
Comments welcome!
Our conservative model finds that going from 0→30 % AI share (US 2020-24) predicts 2.4 % increase in commits. Using task & wage data on occupations, this implies genAI creates $9-14 bill/year in US software alone. Larger estimates of effects from RCTs imply upwards of $100 billion in value / year.
June 11, 2025 at 8:23 PM
Our conservative model finds that going from 0→30 % AI share (US 2020-24) predicts 2.4 % increase in commits. Using task & wage data on occupations, this implies genAI creates $9-14 bill/year in US software alone. Larger estimates of effects from RCTs imply upwards of $100 billion in value / year.
Besides the adoption results, we find newer devs take up AI fastest. We see no gender gap. In fixed-effects models, higher user AI share predicts more commits, and the use of novel code libraries and library pairs. AI extends capabilities and supports exploration.
June 11, 2025 at 8:23 PM
Besides the adoption results, we find newer devs take up AI fastest. We see no gender gap. In fixed-effects models, higher user AI share predicts more commits, and the use of novel code libraries and library pairs. AI extends capabilities and supports exploration.
The resulting classifier scores an out-of-sample AUC of 0.96. We applied it to 80 million commit snapshots from 2019-24, spanning tens of thousands of public repos and developers, to track how the share of AI-authored code evolves over time and across countries.
June 11, 2025 at 8:23 PM
The resulting classifier scores an out-of-sample AUC of 0.96. We applied it to 80 million commit snapshots from 2019-24, spanning tens of thousands of public repos and developers, to track how the share of AI-authored code evolves over time and across countries.
First we built an AI-code detector & gathered data to train it. Human code came from 2018 Python functions & HumanEval 21/23. To create AI-written code examples we had one LLM describe each human example in English then a 2nd LLM coded that description.
June 11, 2025 at 8:23 PM
First we built an AI-code detector & gathered data to train it. Human code came from 2018 Python functions & HumanEval 21/23. To create AI-written code examples we had one LLM describe each human example in English then a 2nd LLM coded that description.
As for other platforms, this paper by @gburtch.bsky.social and colleagues looks at Reddit and SO. The SO results are similar to ours, but they find that activity on Reddit didn't change much.
www.nature.com/articles/s41...
www.nature.com/articles/s41...
The consequences of generative AI for online knowledge communities - Scientific Reports
Scientific Reports - The consequences of generative AI for online knowledge communities
www.nature.com
February 14, 2025 at 8:36 PM
As for other platforms, this paper by @gburtch.bsky.social and colleagues looks at Reddit and SO. The SO results are similar to ours, but they find that activity on Reddit didn't change much.
www.nature.com/articles/s41...
www.nature.com/articles/s41...
The published version of that preprint has a slightly longer descriptive time series in the discussion, see below. We can't extend the counterfactual (comparing SO vs Russian and Chinese platforms) because other LLMs came out.
academic.oup.com/pnasnexus/ar...
academic.oup.com/pnasnexus/ar...
February 14, 2025 at 8:36 PM
The published version of that preprint has a slightly longer descriptive time series in the discussion, see below. We can't extend the counterfactual (comparing SO vs Russian and Chinese platforms) because other LLMs came out.
academic.oup.com/pnasnexus/ar...
academic.oup.com/pnasnexus/ar...
Reposted by Johannes Wachs
23/250 is Large language models reduce public knowledge sharing on online Q&A platforms
This makes me wonder if people are answering Stack Overflow questions with ChatGPT answers . . .
This makes me wonder if people are answering Stack Overflow questions with ChatGPT answers . . .
Large language models reduce public knowledge sharing on online Q&A platforms
Abstract. Large language models (LLMs) are a potential substitute for human-generated data and knowledge resources. This substitution, however, can present
doi.org
January 31, 2025 at 10:09 PM
23/250 is Large language models reduce public knowledge sharing on online Q&A platforms
This makes me wonder if people are answering Stack Overflow questions with ChatGPT answers . . .
This makes me wonder if people are answering Stack Overflow questions with ChatGPT answers . . .
We also find that most novel library imports and combinations are made by less-experienced users, suggesting how important new blood is for long-run ecosystem health.
Feedback warmly welcome!
Feedback warmly welcome!
November 25, 2024 at 12:20 PM
We also find that most novel library imports and combinations are made by less-experienced users, suggesting how important new blood is for long-run ecosystem health.
Feedback warmly welcome!
Feedback warmly welcome!
[Mirrors results on over 200 years of novelties in US patents by Youn et al: royalsocietypublishing.org/doi/full/10.... ].
Two implications for maintenance:
- single libraries will be widely used as ecosystems grow (see plot)
- the many co-used libraries need to stay compatible with each other
Two implications for maintenance:
- single libraries will be widely used as ecosystems grow (see plot)
- the many co-used libraries need to stay compatible with each other
November 25, 2024 at 12:20 PM
[Mirrors results on over 200 years of novelties in US patents by Youn et al: royalsocietypublishing.org/doi/full/10.... ].
Two implications for maintenance:
- single libraries will be widely used as ecosystems grow (see plot)
- the many co-used libraries need to stay compatible with each other
Two implications for maintenance:
- single libraries will be widely used as ecosystems grow (see plot)
- the many co-used libraries need to stay compatible with each other