Lightnews — Scholar-powered news

Marzieh Fadaee

@mziizm.bsky.social

I'm excited to share that I'll be stepping into the role of Head of @cohereforai.bsky.social. It's an honor and a responsibility to lead such an extraordinary group of researchers pushing the boundaries of AI research.

September 5, 2025 at 5:26 PM

Marzieh Fadaee

@mziizm.bsky.social

ACL day 2 ✨

July 29, 2025 at 6:55 AM

Marzieh Fadaee

@mziizm.bsky.social

One of my favorite parts of NeoBabel is multilingual inpainting & extrapolation: you can mask part of an image generated in language A, prompt it in language B, and it fills in the scene naturally—no special tuning needed.

July 9, 2025 at 1:27 PM

Marzieh Fadaee

@mziizm.bsky.social

We used a multistage training setup: starting from class-label grounding, then scaling up to massive multilingual image-text pairs, and finally instruction tuning with high-res, diverse prompts.

This helped the model gradually learn structure, language, and fine-grained control.

July 9, 2025 at 1:27 PM

Marzieh Fadaee

@mziizm.bsky.social

We put a lot of effort into building a clean, well-aligned multilingual dataset (124M image-text pairs across 6 languages) and it paid off.

NeoBabel generates well in every language. And it’s only 2B params—beating much larger models on benchmarks.

July 9, 2025 at 1:27 PM

Marzieh Fadaee

@mziizm.bsky.social

This was my first project in image generation, and coming from language I was shocked at how little care is given to text quality in many vision datasets.
Captions are often noisy, shallow, or poorly formatted.

July 9, 2025 at 1:27 PM

Marzieh Fadaee

@mziizm.bsky.social

🖼️ Most text-to-image models only really work in English.
This limits who can use them and whose imagination they reflect.

We asked: can we build a small, efficient model that understands prompts in multiple languages natively?

July 9, 2025 at 1:27 PM

Marzieh Fadaee

@mziizm.bsky.social

Everyone talks about GEB (I agree, it's a gem) but Hofstadter's Analogy book is criminally underrated. If you're working on learning intelligence through language understanding, it’s a must-read.

June 29, 2025 at 10:12 AM

Marzieh Fadaee

@mziizm.bsky.social

London has me under its spell. every. single. visit.

June 9, 2025 at 5:47 PM

Marzieh Fadaee

@mziizm.bsky.social

7/ 🛠️ We offer 5 concrete fixes:

-Prohibit post-submission score retraction
-Limit private variants per provider
-Deprecate models equitably
-Ensure fair sampling across providers
-Publicly log all model removals

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

6/ ✨ Arena's new prompts aren't as fresh and unseen as expected.

While using Arena-style data in training boosts win rates by 112%, this improvement doesn't transfer to tasks like MMLU, indicating overfitting to Arena's quirks rather than general performance gains.

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

5/ 🧮 Who gets the data?

Google & OpenAI received ~40% of all Arena battle data. In contrast, 83 open-weight models collectively got <30%. This open and free benchmark disproportionately benefits private providers.

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

4/ 🧹 Model deprecation is silent and skewed.

205 models were silently removed, many of them open. This breaks the assumptions of Arena’s Bradley-Terry scoring algorithm when prompt types change over time, making the leaderboard fragile and biased.

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

3/ 🔒 Disproportionate private testing skews the game.

Our simulations show that a weaker model family can outrank a stronger one by testing more variants and publishing the top performer.

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

2/ 🧪 With theory, simulations, and real-world experiments, we stress-test Arena’s fairness and found:

- Undisclosed private model testing warps results

- Silent model deprecation undermines rank stability

- Data access disparities between providers that enable overfitting

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

1/ Science is only as strong as the benchmarks it relies on.

So how fair—and scientifically rigorous—is today’s most widely used evaluation benchmark?

We took a deep dive into Chatbot Arena to find out. 🧵

April 30, 2025 at 12:53 PM

Marzieh Fadaee

@mziizm.bsky.social

Not in Singapore for #ICLR2025 but our lab’s work is! In particular, I am very proud of these collaborations:

✨INCLUDE (spotlight) — models fail to grasp regional nuances across languages

💎To Code or Not to Code (poster) — code is key for generalizing beyond coding tasks

April 22, 2025 at 8:15 AM

Marzieh Fadaee

@mziizm.bsky.social

This isn’t just another benchmark. Kaleidoscope exposes critical gaps in today’s VLMs—especially for low-resource languages and vision+text questions.

Time to move beyond English-centric evaluation. 🔥

April 10, 2025 at 7:52 PM

Marzieh Fadaee

@mziizm.bsky.social

Very excited to release Kaleidoscope—a multilingual, multimodal evaluation set for VLMs, built as part of our open-science initiative!

🌍 18 languages (high-, mid-, low-)
📚 21k questions (55% require image understanding)
🧪 STEM, social science, reasoning, and practical skills

April 10, 2025 at 7:52 PM

Marzieh Fadaee

@mziizm.bsky.social

Good morning Paris

March 12, 2025 at 8:34 AM

Marzieh Fadaee

@mziizm.bsky.social

✨👓 Aya Vision is here 👓✨

A multilingual, multimodal model designed to understand across languages and modalities (text, images, etc) to bridge the language gap and empower global users!

Question: Guess where is this kid coming back from?

Answer: Based on the details provided in the image, it appears that this child is likely returning from music class. The presence of a musical note symbol on his hand, which appears to have been drawn with a pencil or pen, suggests that he was engaged in music-related activities. This symbol is commonly associated with reading and writing music and suggests that the child may have been learning to read music notes, practicing a piece of music, or taking a composition class.

March 4, 2025 at 5:11 PM

Marzieh Fadaee

@mziizm.bsky.social

This #Neurips2024 was the perfect way to end this year. So long Vancouver!

December 16, 2024 at 9:47 PM

Marzieh Fadaee

@mziizm.bsky.social

Day 2 #neurips2024, let the fun officially begin.

On a separate note, as much as I love Amsterdam I'm mountain-deprived and only have eyes for this glorious view this week.

December 11, 2024 at 5:30 PM

Marzieh Fadaee

@mziizm.bsky.social

Good morning Vancouver! It's a lovely day to see old friends and make new ones.

December 10, 2024 at 3:51 PM

Marzieh Fadaee

@mziizm.bsky.social

This was a fantastic collaboration with @agromanou.bsky.social @abosselut.bsky.social
and the research team at EPFL.

Check out the paper here: arxiv.org/abs/2411.19799
and the benchmarks here:
hf.co/datasets/Coh...
hf.co/datasets/Coh...

December 3, 2024 at 12:27 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news