mussar
@mussar.io
i break computers, and they make sounds.
he/they
he/they
@themountaingoats.bsky.social your solo venture (in vermont)
September 27, 2025 at 5:25 PM
@themountaingoats.bsky.social your solo venture (in vermont)
i reimplemented physics.umd.edu/hep/drew/mat... in golang so i could make it a terminal application instead of a web UI. also added some more variance under the hood which will eventually be exposed to the user. check it out here: github.com/0x4D5352/yar...
September 19, 2025 at 1:57 PM
i reimplemented physics.umd.edu/hep/drew/mat... in golang so i could make it a terminal application instead of a web UI. also added some more variance under the hood which will eventually be exposed to the user. check it out here: github.com/0x4D5352/yar...
i made the mistake of trying to build a heatmap of co-occurence data for pairs of subjects in the retractionwatch dataset and forgot that there's 130 or so distinct subjects.
i saved the high res version on github here, if you wanna zoom in a bunch: github.com/0x4D5352/dat...
i saved the high res version on github here, if you wanna zoom in a bunch: github.com/0x4D5352/dat...
September 14, 2025 at 3:03 PM
i made the mistake of trying to build a heatmap of co-occurence data for pairs of subjects in the retractionwatch dataset and forgot that there's 130 or so distinct subjects.
i saved the high res version on github here, if you wanna zoom in a bunch: github.com/0x4D5352/dat...
i saved the high res version on github here, if you wanna zoom in a bunch: github.com/0x4D5352/dat...
(HUM):Humanities
(PHY): Physical Sciences
(SOC): Social Sciences
check alt text for specific numbers
2/2
(PHY): Physical Sciences
(SOC): Social Sciences
check alt text for specific numbers
2/2
September 14, 2025 at 2:57 PM
(HUM):Humanities
(PHY): Physical Sciences
(SOC): Social Sciences
check alt text for specific numbers
2/2
(PHY): Physical Sciences
(SOC): Social Sciences
check alt text for specific numbers
2/2
distributional data on subjects, grouped by area of study. as per retractionwatch.com/retraction-w...:
(B/T): Business and Technology
(BLS): Basic Life Sciences
(ENV): Environmental Sciences
(HSC): Health Sciences
check alt text for specific numbers
1/2
(B/T): Business and Technology
(BLS): Basic Life Sciences
(ENV): Environmental Sciences
(HSC): Health Sciences
check alt text for specific numbers
1/2
September 14, 2025 at 2:57 PM
distributional data on subjects, grouped by area of study. as per retractionwatch.com/retraction-w...:
(B/T): Business and Technology
(BLS): Basic Life Sciences
(ENV): Environmental Sciences
(HSC): Health Sciences
check alt text for specific numbers
1/2
(B/T): Business and Technology
(BLS): Basic Life Sciences
(ENV): Environmental Sciences
(HSC): Health Sciences
check alt text for specific numbers
1/2
box and whiskers plot for drays to publications - it's cool to see that even with the max time available decreasing linearly as we approach the present, the mean time to retraction is also getting shorter. the median drifts lower, the upper quartile gets further from the box... this is a _good_ sign
September 13, 2025 at 11:37 PM
box and whiskers plot for drays to publications - it's cool to see that even with the max time available decreasing linearly as we approach the present, the mean time to retraction is also getting shorter. the median drifts lower, the upper quartile gets further from the box... this is a _good_ sign
Looking at durations on a log plot based on total number of days but divided by 7 to give a rough count of weeks - so log shows under a week, between a week and a quarter, and between a quarter and about two years.
lines up with the 1-2 years as the most common retraction duration.
lines up with the 1-2 years as the most common retraction duration.
September 13, 2025 at 6:10 PM
Looking at durations on a log plot based on total number of days but divided by 7 to give a rough count of weeks - so log shows under a week, between a week and a quarter, and between a quarter and about two years.
lines up with the 1-2 years as the most common retraction duration.
lines up with the 1-2 years as the most common retraction duration.
basic x-y scatterplot of publication date and retraction date, with hue assigned to duration as a way to emphasize the vertical height since scatter not bar.
its cool to see how retractions durations diffuse more and more since the turn of the millenium, and that horizontal line around 2020
its cool to see how retractions durations diffuse more and more since the turn of the millenium, and that horizontal line around 2020
September 13, 2025 at 3:54 PM
basic x-y scatterplot of publication date and retraction date, with hue assigned to duration as a way to emphasize the vertical height since scatter not bar.
its cool to see how retractions durations diffuse more and more since the turn of the millenium, and that horizontal line around 2020
its cool to see how retractions durations diffuse more and more since the turn of the millenium, and that horizontal line around 2020
a quick sampling of retraction durations for the largest publishers! 0sec comes from the paper being retracted before/when it was published. max spread is so wide as to be somewhat meaningless, but medians seem to show a trend of 1-2 years on average between publication and retraction, less outliers
September 12, 2025 at 8:58 PM
a quick sampling of retraction durations for the largest publishers! 0sec comes from the paper being retracted before/when it was published. max spread is so wide as to be somewhat meaningless, but medians seem to show a trend of 1-2 years on average between publication and retraction, less outliers
doing a bit deeper of a dive into the outliers, it looks like the field of psychology has a habit of retracting papers multiple decades after the fact... everything in this screenshot is 45-80 years between publication and retraction!
September 12, 2025 at 12:00 AM
doing a bit deeper of a dive into the outliers, it looks like the field of psychology has a habit of retracting papers multiple decades after the fact... everything in this screenshot is 45-80 years between publication and retraction!
trying to summarize flattens the results back to microseconds which is... less than useful for this purpose, but easy to fix with nushell commands now that i'm not calculating durations over multiple years in μs.
seems like the vast majority of retractions happen within the first to third year.
seems like the vast majority of retractions happen within the first to third year.
September 11, 2025 at 11:42 PM
trying to summarize flattens the results back to microseconds which is... less than useful for this purpose, but easy to fix with nushell commands now that i'm not calculating durations over multiple years in μs.
seems like the vast majority of retractions happen within the first to third year.
seems like the vast majority of retractions happen within the first to third year.
going back to the main dataset for a second to close out tonight: the reasons! Investigations are a big chunk of the results but its kinda tautological, cause yeah they probably would investigate. accounting for those, looks like we're dealing with bad data, fabricated data, and bad peer reviews
September 11, 2025 at 1:45 AM
going back to the main dataset for a second to close out tonight: the reasons! Investigations are a big chunk of the results but its kinda tautological, cause yeah they probably would investigate. accounting for those, looks like we're dealing with bad data, fabricated data, and bad peer reviews
to get a clearer picture, i sorted journals by publisher and can say pretty definitively... WTF is up with the IEEE? Elsevier and Springer look bad when you see the raw numbers, but really, they just have a fuckton of journals. IEEE has a lot, but their retraction ratio is the worst of the big ones
September 11, 2025 at 1:25 AM
to get a clearer picture, i sorted journals by publisher and can say pretty definitively... WTF is up with the IEEE? Elsevier and Springer look bad when you see the raw numbers, but really, they just have a fuckton of journals. IEEE has a lot, but their retraction ratio is the worst of the big ones
on the publisher front, Hindawi was no surprise. an open-access journal publisher that turned out to be paper mill once Wiley publishing bought them and started looking into the results. funnily enough, this means Wiley owns almost 25% of all retractions.
WTF is up with the IEEE, though???
WTF is up with the IEEE, though???
September 11, 2025 at 1:11 AM
on the publisher front, Hindawi was no surprise. an open-access journal publisher that turned out to be paper mill once Wiley publishing bought them and started looking into the results. funnily enough, this means Wiley owns almost 25% of all retractions.
WTF is up with the IEEE, though???
WTF is up with the IEEE, though???
spread is a lot more even across areas of study, but Medicine, Biology, Technology, and Business are a few points shy of being 50% of the retractions overall. of the subfields, cell bio, cancer bio, and molecular bio have the worst records, followed by economics and oncology (aka cancer).
September 11, 2025 at 12:56 AM
spread is a lot more even across areas of study, but Medicine, Biology, Technology, and Business are a few points shy of being 50% of the retractions overall. of the subfields, cell bio, cancer bio, and molecular bio have the worst records, followed by economics and oncology (aka cancer).
the subjects were actually way easier to parse, but contained an internal encoding schema on top of the rows being semicolon-delimited strings. pulled them out with some cursed regex parsing and pattern-matching to get a dedicated dataset that would be easier to draw information out of
September 11, 2025 at 12:44 AM
the subjects were actually way easier to parse, but contained an internal encoding schema on top of the rows being semicolon-delimited strings. pulled them out with some cursed regex parsing and pattern-matching to get a dedicated dataset that would be easier to draw information out of
this paper was such a thorn in my side when i was starting to look at the durations between when papers were published and when they were retracted 🫠 the only one with meridiem in the times, which messed up my date parsing
still, cool that they have the first ever english lang retraction on record!
still, cool that they have the first ever english lang retraction on record!
September 11, 2025 at 12:35 AM
this paper was such a thorn in my side when i was starting to look at the durations between when papers were published and when they were retracted 🫠 the only one with meridiem in the times, which messed up my date parsing
still, cool that they have the first ever english lang retraction on record!
still, cool that they have the first ever english lang retraction on record!
retraction times are more spread out, but there are certainly clusters. 2023 and 2011 seem to have been very busy years for retractions... and the 2020s overall have seen a huge uptick in retractions, with over 56% of the retractions on record.
but what's up with that retraction from 1756?
but what's up with that retraction from 1756?
September 11, 2025 at 12:31 AM
retraction times are more spread out, but there are certainly clusters. 2023 and 2011 seem to have been very busy years for retractions... and the 2020s overall have seen a huge uptick in retractions, with over 56% of the retractions on record.
but what's up with that retraction from 1756?
but what's up with that retraction from 1756?
this is where i noticed that the DPRK both 1. has published research and 2. _had research retracted_, which is not as easy as you might think! i particularly like the bottom entry here, where they tried out that red light therapy people use for hair loss on rice seeds. i guess it doesn't work...
September 11, 2025 at 12:16 AM
this is where i noticed that the DPRK both 1. has published research and 2. _had research retracted_, which is not as easy as you might think! i particularly like the bottom entry here, where they tried out that red light therapy people use for hair loss on rice seeds. i guess it doesn't work...
the biggest names skew the distribution so heavily, over 93% of the retractions across all 184 countries are contained within the first 25 countries. the average number of retractions is about 444, the median is all the way down at 20, and the standard deviation is about one russia retraction count
September 10, 2025 at 10:56 PM
the biggest names skew the distribution so heavily, over 93% of the retractions across all 184 countries are contained within the first 25 countries. the average number of retractions is about 444, the median is all the way down at 20, and the standard deviation is about one russia retraction count
first off - country of origin counts. there's 66,255 entries in the version of the dataset i have, updated on 2025/08/28. top contenders seem obvious. china takes the top spot by a huge margin (43%!), but they are also publishing the most, so more data is needed to get per-capita numbers.
September 10, 2025 at 10:51 PM
first off - country of origin counts. there's 66,255 entries in the version of the dataset i have, updated on 2025/08/28. top contenders seem obvious. china takes the top spot by a huge margin (43%!), but they are also publishing the most, so more data is needed to get per-capita numbers.
i've been spending a good chunk of my spare time lately messing with crossref's retraction watch dataset (www.crossref.org/documentatio...) just to see what kind of information i could find. this will be a thread of findings/oddities/data trends until i get around to writing a blog post about it.
September 10, 2025 at 10:40 PM
i've been spending a good chunk of my spare time lately messing with crossref's retraction watch dataset (www.crossref.org/documentatio...) just to see what kind of information i could find. this will be a thread of findings/oddities/data trends until i get around to writing a blog post about it.