Open Source AI early and often
@sparkycollier on twitter and elsewhere
Links: markcollier.me
openinfrafoundation.formstack.com/forms/2025_o...
openinfrafoundation.formstack.com/forms/2025_o...
In parallel, research data-efficient models that can be trained on 100M tokens. This is comparable to both what humans need to develop speech and to wikipedia, proving that it's possible to curate this amount of data as a community.
In parallel, research data-efficient models that can be trained on 100M tokens. This is comparable to both what humans need to develop speech and to wikipedia, proving that it's possible to curate this amount of data as a community.