Mohammed Hamdy
mmhamdy.bsky.social
Mohammed Hamdy
@mmhamdy.bsky.social
A curious explorer of human and machine learning 🧐 🤝🤖
The one Frankenstein film to rule them all!

Thank you, @realgdt.bsky.social 🙏
June 2, 2025 at 8:11 PM
In this article, I explore the story behind some of the ideas introduced in the Transformer paper.

Exploring things from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name.

You may find it interesting! 🙂

👇link below
March 30, 2025 at 11:38 AM
🧬 Join us this Wednesday on @mozilla.ai discord server in our second session of the Biological Representation Learning series where we discuss landmark papers in the field!

We will be presenting the ProGen protein language model paper from Salesforce. See you there! 😃
January 27, 2025 at 12:29 PM
4️⃣ Linguistic representation has not improved by most measures: Gini Coefficients for text and speech datasets show significant concentration, indicating limited progress in diversifying data sources.

5/n
December 19, 2024 at 4:34 PM
3️⃣ Geographical representation has not improved for a decade: Datasets from African and South American organizations account for < 0.2% of all modality content, while North American or European organizations span 93% of text tokens and 60%+ hours of speech and video.

4/n
December 19, 2024 at 4:34 PM
2️⃣ Inconsistent dataset licenses: While ~30% of datasets have permissive licenses, 78%+ of their sources carry hidden anti-crawling or licensing restrictions, making compliance a minefield.

3/n
December 19, 2024 at 4:34 PM
📌 Key Findings

1️⃣ The web is still the primary source: The internet, social media platforms, and synthetically generated data are increasingly becoming the predominant sources for multimodal data, compared to curated sources.

2/n
December 19, 2024 at 4:34 PM
✨ Excited to share our latest work from The Data Provenance Initiative ☸️

This is the most comprehensive audit of multimodal training data, auditing ~4000 datasets between 1990 and 2024, and covering more than 400 unique tasks in 608 languages!

🧵 1/n
December 19, 2024 at 4:34 PM
The Hudsucker Proxy is the most underrated Coen Brothers film!
November 29, 2024 at 12:26 PM