Alessio Placitelli
Alessio Placitelli
@dexterp37.bsky.social
Data & MLOps tinkerer @ Mozilla.
Reposted by Alessio Placitelli
Most #generativeAI models were trained on Common Crawl, a massive archive of web crawl data. Yet most people never heard of it. My new research studies Common Crawl in-depth and highlights its influence on LLM research and development foundation.mozilla.org/en/research/... (1/10)
Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI
Mozilla research finds that Common Crawl's outsized role in the generative AI boom has improved transparency and competition, but is also contributing to biased and opaque generative AI models.
foundation.mozilla.org
February 6, 2024 at 4:01 PM