Shinobi
shinobi42.bsky.social
Shinobi
@shinobi42.bsky.social
Anti AI Data Scientists and all around contrarian.
There is a place near me that makes sourdough donuts and they are the best thing I have ever had.

But also sourdough helps a lot of people who have sensitivity to modern yeast, so I may not like it all the time but I support it.
December 7, 2025 at 5:18 PM
You beat me to it!
December 7, 2025 at 12:44 AM
I read that food you cook is less appealing because you have been smelling it while you cook and so it's not as interesting to your taste buds.

This could be bullshit but it also makes sense.
December 6, 2025 at 4:12 PM
They are getting more views on their propaganda by stealing and upsetting artists and their fans. That's the strategy.
December 6, 2025 at 2:11 PM
GenAI is just continuing what ipads, short form video and teaching to the test already started.
December 2, 2025 at 3:42 AM
I recommend this post series on balloon juice for a more nuanced and informative non rant.

balloon-juice.com/2025/10/29/g...
Balloon Juice - Part 1: What is AI, And How Did We Get Here?
Guest post series from *Carlo Graziani. (Editor’s note:  It’s Carlo, no S.) On Artificial Intelligence Hello, Jackals. Welcome, and thank you for this opportunity. What follows is the first part of a ...
balloon-juice.com
December 1, 2025 at 10:54 PM
Other major issues I have are
- it encourages skill atrophy
- it's outputs will always necessarily be a little bit wrong because it is a model
- there is no clear water marking or way to ensure generated content can be seperated from real content
December 1, 2025 at 10:54 PM
Substack is a great example and probably one of the main sources used to train these algorithms how to generate code.

Now the website has limited traffic or engagement.
December 1, 2025 at 10:54 PM
Supporters argue that this data is fair game but when this data was shared nothing like generative AI existed and few of us could have imagined that our words would permanently become part of something that might replace us.

The owners of the data used to create these models are us.
December 1, 2025 at 10:54 PM
So even if we did ask them to compensate contributors to responses - they can't.

These companies are now building what will hopefully be profitable businesses off of decades of writing images and other data that's been shared on the internet - for free, without anyone's informed consent.
December 1, 2025 at 10:54 PM
When's the algorithm has been trained using a set of copyrighted works those copyrighted works don't even need to be stored as part of the data set there's also no way for us to tell which works are being used.

We can tell sometimes because a style is very distinct (Gibli for example)
December 1, 2025 at 10:54 PM
GenAI, requires a huge amount of data not only for the tasks allowing you to understand requests but also for the task flowing to generate responses.

In order to get that data the companies that built GenAI scraped the internet for text and images as well as whole books and copyrighted works.
December 1, 2025 at 10:54 PM
Hi, I am a Data Scientist who works with it daily.

The issue that I have with AI is not that it exists or that in and of itself is immoral and it is not even related to the large economic and environmental impacts though those are a concern.

My issue is the stolen labor of humankind.
December 1, 2025 at 10:54 PM
I thnk the problem is when the mission or vision is too broad.

As I have moved up in responsibility at work I find that having a clear and specific vision from leadership helps with decision making and resolving disagreements.
November 30, 2025 at 5:10 PM
Also the data is more targeted so the training would be significantly less than generalized llms or multi modal models.

Training data to do one task is on a different scale from training it to do every task.
November 29, 2025 at 8:10 PM
Nothing in that paper indicated to me large scale data scraping. Please cite.
November 29, 2025 at 8:08 PM