toverbeke.bsky.social
@toverbeke.bsky.social
To be more precise, the OSAID asks for information on the data ('metadata') not for the data itself and that is what generated the massive rejection of the open source community.
February 14, 2025 at 7:01 PM
If you don't believe me, you can ask the person who wrote the OSD ;-) postopen.org/documents/re...
Real Open Source AI Definition
postopen.org
February 14, 2025 at 6:58 PM
No need indeed to ask for other things (documentation etc.) nor for it to be all done in the open. If you have the data and the code you have enough.
February 14, 2025 at 6:55 PM
You can apply the OSD to machine learning, the only thing you need to be aware of is that the data are part of the symbolic instructions, part of the source.
February 14, 2025 at 6:54 PM
In summary, the freedom to run and to use are there with open weights models, but not the freedom to understand and modify. QED.
February 14, 2025 at 6:52 PM
Also, to change the model you can tweak your data (add, remove, preprocess in a different way), change the fitting algorithms, criteria you want to optimize for etc. With just the weights, you are handcuffed and have a very limited freedom.
February 14, 2025 at 6:52 PM
Based on the weights alone I have a very limited understanding of the model. What are the biases of the model and where do these come from ? The answer is typically to be found in the data.
February 14, 2025 at 6:52 PM
Clearly not the weights themselves: the source to get to the end result cannot be the end result
itself. So if you talk about open source, you need access to the source and the source are (if I simplify) the combination of the data and the code that in the fitting procedure result in the weights.
February 14, 2025 at 6:52 PM
At the end of the model fitting process you obtain a set of parameters that - with some support code - you can use to make your predictions. These parameters are also known as weights and are the very end of the process. Now imagine I give you the weights - what will be the source of the weights ?
February 14, 2025 at 6:52 PM
... and then recreate the software with your modifications and share it. In the context of machine learning (e.g. predictive machine learning) you
will start from data, preprocess the data, engineer and select your features, fit models, assess, go
back to the data etc.
February 14, 2025 at 6:52 PM
That's special. In software in general, for it to be open source you need to be provided the source from which you can recreate the software. You can also inspect this source in order to understand what is happening and at any stage or any corner of the code you can make modifications...
February 14, 2025 at 6:52 PM
Agreed. Before US Copyright Act helped the launch proprietary software it was - in practice - open source, but the principles and explicit thinking about it started a bit later.
February 14, 2025 at 12:37 PM
Imagine what would have happened if free software and open source would have lowered the bar 40 years ago "because virtually no software matches our definition"...
February 14, 2025 at 11:06 AM
LLMs are a tiny subset of machine learning models and outside of LLMs you can find hundreds in the scientific community that provide the data and are truly open source. If you restrict your vision to LLMs, then e.g. Olmo 2 models, Pleias models fit the definition and many more are being trained.
February 14, 2025 at 11:06 AM
You focus on the use and that is fine, but open source is more than that and needs to respect four essential freedoms: the freedom to use, to understand, to modify and to share (with or without modifications).
February 14, 2025 at 9:58 AM
Software runs on top of on OS, yes, but the crux is that I can recompile it or port it to another OS and it will behave in the same way. AI models are a function of code and data. Change the data and the model changes. In other words the source for an AI model includes both code and data.
February 14, 2025 at 9:54 AM
Well... this is coming from a misunderstanding of how open source works. It is a concept with a history of 40 years. For your information AI is a bit broader than the LLMs you know from the newspapers. Without training data all you can do is fine tune or 'reverse engineer'. Voodoo is not enough!
February 14, 2025 at 7:50 AM
With the current OSAID you give away the freedom to understand and to modify, because for that you need the training data (and not some vague description of the data). With open weight models you equally give up on those freedoms. Not saying such models cannot be useful, but they're not open source.
February 13, 2025 at 10:17 PM
The OSI already diluted the meaning of open source in the context of AI. What you suggest would dilute it even more. Open source means respecting the freedom to run, to understand, to modify, to share.
February 13, 2025 at 10:17 PM
La comédie du diable?
February 9, 2025 at 10:56 AM