Arij Riabi
arijriabi.bsky.social
Arij Riabi
@arijriabi.bsky.social
PhD student working on NLP for low-resource, non-standardized language varieties 🍉
Reposted by Arij Riabi
Thrilled to release Gaperon, an open LLM suite for French, English and Coding 🧀

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
November 7, 2025 at 9:11 PM
Reposted by Arij Riabi
We built the simplest possible social media platform. No algorithms. No ads. Just LLM agents posting and following.

It still became a polarization machine.

Then we tried six interventions to fix social media.

The results were… not what we expected.

arxiv.org/abs/2508.03385
Can We Fix Social Media? Testing Prosocial Interventions using Generative Social Simulation
Social media platforms have been widely linked to societal harms, including rising polarization and the erosion of constructive debate. Can these problems be mitigated through prosocial interventions?...
arxiv.org
August 6, 2025 at 8:24 AM
Reposted by Arij Riabi
ModernBERT or DeBERTaV3?

What's driving performance: architecture or data?

To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.

Here are our findings:
April 14, 2025 at 3:41 PM
Reposted by Arij Riabi
Congratulations to @arijriabi.bsky.social who successfully defended her PhD “Small is Beautiful: Addressing Resource Scarcity, Language Variation, & Transfer Challenges for Automatic Detection of Harmful Language” last Tuesday, supervised by @zehavoc.bsky.social & @openlaurent.bsky.social 👩‍🎓🎉
March 25, 2025 at 10:46 AM
I am excited to share that I have successfully defended my PhD, "Addressing Resource Scarcity, Language Variation, and Transfer Challenges for Automatic Detection of Harmful Language." 🎉
👩‍🎓👩‍🎓🎉
@inriaparisnlp.bsky.social
@sorbonne-universite.fr
March 20, 2025 at 8:45 AM
Reposted by Arij Riabi
🎉 🌍✍️ I'm thrilled to announce that our paper, "Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties", co-authored with @arijriabi.bsky.social and @zehavoc.bsky.social, has been accepted for the #VarDial2025 workshop during #COLING2025! 🎉 1/5
December 27, 2024 at 5:02 PM
Reposted by Arij Riabi
most people want a quick and simple answer to why AI systems encode/exacerbate societal and historical bias/injustice and due to the reductive but common thinking of "bias in, bias out," the obvious culprit often is training data but this is not entirely true

1/
November 24, 2024 at 4:26 PM
Reposted by Arij Riabi
Now that I am on bluesky, let me take you again on a threaded tour of HTR-United (#HTR_United), a project founded and led by @ponteineptique.bsky.social and I since September 2021. Its main goal is to facilitate finding and sharing open datasets to train HTR and OCR models!

htr-united.github.io
HTR-United
HTR-United is a catalog and an ecosystem for sharing and finding ground truth for optical character or handwritten text recognition (OCR/HTR).
htr-united.github.io
October 30, 2023 at 10:48 AM