Lightnews — Scholar-powered news

Reposted by Shayne Longpre

Raphaël Merx

@rapha.dev

This is some legit really impressive work!!

Shayne Longpre @shaynelongpre.bsky.social · 24d

📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵

October 30, 2025 at 12:24 PM

Shayne Longpre

@shaynelongpre.bsky.social

📢Thrilled to introduce ATLAS 🗺️: the largest multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍 Is scaling diff by lang?

🧙‍♂️ Can we model the curse of multilinguality?

⚖️ Pretrain vs finetune from checkpoint?

🔀 X-lingual transfer scores across langs?

1/🧵

October 28, 2025 at 2:03 PM

Reposted by Shayne Longpre

Dustin Wright

@dustinbwright.com

Which, whose, and how much knowledge do LLMs represent?

I'm excited to share our preprint answering these questions:

"Epistemic Diversity and Knowledge Collapse in Large Language Models"

📄Paper: arxiv.org/pdf/2510.04226
💻Code: github.com/dwright37/ll...

1/10

October 13, 2025 at 11:25 AM

Shayne Longpre

@shaynelongpre.bsky.social

Delighted to see BigGen Bench paper receive the 🏆best paper award 🏆at NAACL 2025!

BigGen Bench introduces fine-grained, scalable, & human-aligned evaluations:

📈 77 hard, diverse tasks
🛠️ 765 exs w/ ex-specific rubrics
📋 More human-aligned than previous rubrics
🌍 10 languages, by native speakers

1/

May 6, 2025 at 1:50 PM

Reposted by Shayne Longpre

Sara Hooker

@sarahooker.bsky.social

It is critical for scientific integrity that we trust our measure of progress.

The @lmarena.bsky.social has become the go-to evaluation for AI progress.

Our release today demonstrates the difficulty in maintaining fair evaluations on the Arena, despite best intentions.

April 30, 2025 at 2:55 PM

Reposted by Shayne Longpre

Knight First Amendment Institute

@knightcolumbia.org

How should regulatory proposals adapt to the prevalence of general-purpose AI when the global geopolitical order is being reconfigured? @atoosakz.bsky.social, Deirdre K. Mulligan, @randomwalker.bsky.social, @alondra.bsky.social, & @shaynelongpre.bsky.social weigh in:
youtu.be/cRsbjGFPJaM?...

Day 1 Opening Remarks and Panel 1: Regulating AI in Democratic Upheaval (AI & Democratic Freedoms)

YouTube video by Knight First Amendment Institute

youtu.be

April 30, 2025 at 2:05 PM

Shayne Longpre

@shaynelongpre.bsky.social

🛬 in Singapore for #ICLR2025!

DM me to catch up—but only if you have a local food/bar/event rec!

April 22, 2025 at 8:44 PM

Shayne Longpre

@shaynelongpre.bsky.social

Thrilled our global data ecosystem audit was accepted to #ICLR2025!

Empirically, it shows:

1️⃣ Soaring synthetic text data: ~10M tokens (pre-2018) to 100B+ (2024).

2️⃣ YouTube is now 70%+ of speech/video data but could block third-party collection.

3️⃣ <0.2% of data from Africa/South America.

1/

April 14, 2025 at 3:28 PM

Reposted by Shayne Longpre

Knight First Amendment Institute

@knightcolumbia.org

📍EVENT: Day 2 of our “AI and Democracy” symposium will be kicking off shortly. Programming will begin with welcome remarks from George Deodatis @columbiaseas.bsky.social at 9:30am ET. #AIDemocraticFreedoms

Watch the full event on our livestream here:
www.youtube.com/watch?v=X1gj...

Artificial Intelligence and Democratic Freedoms (Day 2)

YouTube video by Knight First Amendment Institute

www.youtube.com

April 11, 2025 at 12:57 PM

Reposted by Shayne Longpre

Marzieh Fadaee

@mziizm.bsky.social

Very excited to release Kaleidoscope—a multilingual, multimodal evaluation set for VLMs, built as part of our open-science initiative!

🌍 18 languages (high-, mid-, low-)
📚 21k questions (55% require image understanding)
🧪 STEM, social science, reasoning, and practical skills

April 10, 2025 at 7:52 PM

Reposted by Shayne Longpre

Knight First Amendment Institute

@knightcolumbia.org

Panel 1: Regulating AI in a Time of Democratic Upheaval starts in approximately 5 minutes.

Panelists: @atoosakz.bsky.social, @randomwalker.bsky.social, @alondra.bsky.social, and Deirdre K. Mulligan.
Moderator: @shaynelongpre.bsky.social.
#AIDemocraticFreedoms

April 10, 2025 at 1:44 PM

Shayne Longpre

@shaynelongpre.bsky.social

This week, @stanfordhai.bsky.social released the 2025 AI Index. It’s well worth reading to understand the evolving ecosystem of AI. Some highlights that stood out to me:

1/

April 9, 2025 at 3:25 PM

Shayne Longpre

@shaynelongpre.bsky.social

Excited to speak at the workshop on Technical AI Governance in Vancouver this summer!

#ICML2025

Technical AI Governance @ ICML 2025 @taig-icml.bsky.social · Apr 1

📣We’re thrilled to announce the first workshop on Technical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Join us (& this stellar list of speakers) in bringing together technical & policy experts to shape the future of AI governance! www.taig-icml.com

April 1, 2025 at 4:52 PM

Reposted by Shayne Longpre

Andy Tseng

@andytseng.bsky.social

#AI is evolving fast, and so are its flaws. A fresh approach to finding and reporting AI bugs is long overdue. Great initiative by @shaynelongpre.bsky.social and team, transparency and accountability in AI development are essential! #AISafety #ResponsibleAI #AIEthics #MIT

Researchers Propose a Better Way to Report Dangerous AI Flaws

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

www.wired.com

March 17, 2025 at 3:41 AM

Reposted by Shayne Longpre

WIRED

@wired.com

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

Researchers Propose a Better Way to Report Dangerous AI Flaws

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

wrd.cm

March 13, 2025 at 3:03 PM

Shayne Longpre

@shaynelongpre.bsky.social

Thank you @willknight.bsky.social for excellent coverage of our new proposal!

www.wired.com/story/ai-res...

March 13, 2025 at 3:59 PM

Shayne Longpre

@shaynelongpre.bsky.social

What are 3 concrete steps that can improve AI safety in 2025? 🤖⚠️

Our new paper, “In House Evaluation is Not Enough” has 3 calls-to-actions to empower evaluators:

1️⃣ Standardized AI flaw reports
2️⃣ AI flaw disclosure programs + safe harbors.
3️⃣ A coordination center for transferable AI flaws.

1/🧵

March 13, 2025 at 3:59 PM

Reposted by Shayne Longpre

Andy Sellars

@sellars.bsky.social

Very glad to join this paper organized by @shaynelongpre.bsky.social. Here's the paper itself: crfm.stanford.edu/2025/03/13/t...

Researchers Propose a Better Way to Report Dangerous AI Flaws

After identifying major flaws in popular AI models, researchers are pushing for a new system to identify and report bugs.

www.wired.com

March 13, 2025 at 3:25 PM

Reposted by Shayne Longpre

Artificial Intelligence Institute

@ekaya.bsky.social

Bringing transparency to the data used to train artificial intelligence

mitsloan.mit.edu/ideas-made-t...

Bringing transparency to the data used to train artificial intelligence | MIT Sloan

Using the wrong datasets to train AI models can result in legal risks, bias, or lower-quality models. The Data Provenance Initiative’s tool can help.

mitsloan.mit.edu

March 4, 2025 at 2:16 AM

Shayne Longpre

@shaynelongpre.bsky.social

Thrilled to be at #AAAI2025 for our tutorial, “AI Data Transparency: The Past, Present, and Beyond.”

We’re presenting the state of transparency, tooling, and policy, from the Foundation Model Transparency Index, Factsheets, the the EU AI Act to new frameworks like @MLCommons’ Croissant.

1/

February 26, 2025 at 6:15 PM

Reposted by Shayne Longpre

klaudia jaźwińska

@klaudia.bsky.social

Really excellent explainer by @shaynelongpre.bsky.social‬ that clearly lays out what's at stake in the "AI crawler wars"

How we stand to lose out

As this cat-and-mouse game accelerates, big players tend to outlast little ones. Large websites and publishers will defend their content in court or negotiate contracts. And massive tech companies can afford to license large data sets or create powerful crawlers to circumvent restrictions. But small creators, such as visual artists, YouTube educators, or bloggers, may feel they have only two options: hide their content behind logins and paywalls, or take it offline entirely. For real users, this is making it harder to access news articles, see content from their favorite creators, and navigate the web without hitting logins, subscription demands, and captchas each step of the way.

Perhaps more concerning is the way large, exclusive contracts with AI companies are subdividing the web. Each deal raises the website’s incentive to remain exclusive and block anyone else from accessing the data—competitor or not. This will likely lead to further concentration of power in the hands of fewer AI developers and data publishers. A future where only large companies can license or crawl critical web data would suppress competition and fail to serve real users or many of the copyright holders.

February 21, 2025 at 4:56 PM

Reposted by Shayne Longpre

Alex Abdo

@alexabdo.bsky.social

Re: the FTC and the platforms

We *should* be concerned about platform power over speech, but it isn’t censorship.

As the Supreme Court said last year, the companies’ editorial decisions to moderate content are protected by the First Amendment.

1/

February 20, 2025 at 10:50 PM

Reposted by Shayne Longpre

Renee DiResta

@noupside.bsky.social

Let’s see if the algorithm and data remains transparent.

Musk to Revise Community Notes Amid Bias Concerns
Last updated 26 minutes ago
Elon Musk has announced intentions to revise the Community Notes feature on X, previously praised as a tool for unbiased fact-checking, due to concerns over manipulation by governments and legacy media.
His critique focused particularly on a note regarding Ukrainian President Volodymyr Zelensky's approval ratings, sparking a debate on whether this move is an attempt to maintain the feature's integrity or to align it with Musk's personal views.
This story is a summary of posts on X and may evolve over time. Grok can make mistakes, verify its outputs.

February 20, 2025 at 9:10 PM

Shayne Longpre

@shaynelongpre.bsky.social

I compiled a list of resources for understanding AI copyright challenges (US-centric). 📚

➡️ why is copyright an issue for AI?
➡️ what is fair use?
➡️ why are memorization and generation important?
➡️ how does it impact the AI data supply / web crawling?

🧵

February 19, 2025 at 4:32 PM

Reposted by Shayne Longpre

Nick Diakopoulos

@ndiakopoulos.bsky.social

Great point by @shaynelongpre.bsky.social on the AI crawler wars: "Unless we can nurture an ecosystem with different rules for different data uses, we may end up with strict borders across the web, exacting a price on openness and transparency." www.technologyreview.com/2025/02/11/1...

AI crawler wars threaten to make the web more closed for everyone

There’s an accelerating cat-and-mouse game between web publishers and AI crawlers, and we all stand to lose.

www.technologyreview.com

February 13, 2025 at 7:44 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news