Lightnews — Scholar-powered news

Reposted by Clara Na

Amanda Bertsch

@abertsch.bsky.social

We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀

Paper: arxiv.org/abs/2511.02817
Dataset: huggingface.co/oolongbench
Code: github.com/abertsch72/o...
Leaderboard: oolongbench.github.io

Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities

As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently...

arxiv.org

November 7, 2025 at 5:07 PM

Reposted by Clara Na

Amanda Bertsch

@abertsch.bsky.social

Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

Performance of a sweep of models on Oolong-synth and Oolong-real. Performance decreases with increasing context length, sometimes steeply.

November 7, 2025 at 5:07 PM

Clara Na

@clarana.bsky.social

Yes! tbh this method is probably much more immediately useful for helping one understand subtle differences between [models trained on] subtly different data subsets, vs a loftier goal of helping one find "the" best data mixture -- to anyone considering this method, please feel free to reach out :)

Ted Underwood @tedunderwood.com · May 5

The method in this paper was designed to find an optimal data mixture. But researchers in the human sciences who are training models *in order to understand the effect of the data* might also consider this as a clever way of evaluating hundreds of subsets without training hundreds of models. #MLSky

May 6, 2025 at 4:16 AM

Clara Na

@clarana.bsky.social

Come through! #492 in Hall 2!, 10am-12:30pm

April 26, 2025 at 1:59 AM

Reposted by Clara Na

Emma Strubell

@strubell.bsky.social

Our paper documenting the environmental impacts of creating OLMo language models is the most honest and comprehensive characterization I know of, including training, development (!) and inference costs. If you're at ICLR chat with @jacobcares.bsky.social & @clarana.bsky.social Sat morning 10-12:30!

Jacob Morrison @jacobcares.bsky.social · Apr 23

📜Paper: arxiv.org/abs/2503.05804
✍️Thanks to my illustrious coauthors @clarana.bsky.social @jaredfern.bsky.social timdettmers.com @strubell.bsky.social @jessedodge.bsky.social, t'was a fun project 🌏

Holistically Evaluating the Environmental Impact of Creating Language Models

As the performance of artificial intelligence systems has dramatically increased, so too has the environmental impact of creating these systems. While many model developers release estimates of the po...

arxiv.org

April 25, 2025 at 1:14 PM

Reposted by Clara Na

Jacob Morrison

@jacobcares.bsky.social

📜Paper: arxiv.org/abs/2503.05804
✍️Thanks to my illustrious coauthors @clarana.bsky.social @jaredfern.bsky.social timdettmers.com @strubell.bsky.social @jessedodge.bsky.social, t'was a fun project 🌏

Holistically Evaluating the Environmental Impact of Creating Language Models

As the performance of artificial intelligence systems has dramatically increased, so too has the environmental impact of creating these systems. While many model developers release estimates of the po...

arxiv.org

April 23, 2025 at 3:22 PM

Reposted by Clara Na

Jacob Morrison

@jacobcares.bsky.social

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

April 23, 2025 at 3:22 PM

Reposted by Clara Na

Data Rescue Project #DataRescue

@datarescueproject.org

We've received multiple notes that NOAA research services (Office of Oceanic and Atmospheric Research) may go offline at midnight. @safeguardingdata.bsky.social is working on web archiving, but if others want to nominate on this, that might be good: digital2.library.unt.edu/nomination/G...

Nomination Tool: Project URL Nomination

digital2.library.unt.edu

April 3, 2025 at 9:36 PM

Reposted by Clara Na

Alicia DeVrio

@uhleeeeeeeshuh.bsky.social

How can we better think and talk about human-like qualities attributed to language technologies like LLMs? In our #CHI2025 paper, we taxonomize how text outputs from cases of user interactions with language technologies can contribute to anthropomorphism. arxiv.org/abs/2502.09870 1/n

Image of the first page of the CHI 2025 paper titled "A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies" by authors Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, & Su Lin Blodgett

March 6, 2025 at 3:43 AM

Reposted by Clara Na

Akhila Yerukola

@akhilayerukola.bsky.social

Did you know? Gestures used to express universal concepts—like wishing for luck—vary DRAMATICALLY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨

📣 We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal behavior!

📜: arxiv.org/abs/2502.17710

Figure showing that interpretations of gestures vary dramatically across regions and cultures. ‘Crossing your fingers,’ commonly used in the US to wish for good luck, can be deeply offensive to female audiences in parts of Vietnam. Similarly, the 'fig gesture,' a playful 'got your nose' game with children in the US, carries strong sexual connotations in Japan and can be highly offensive.

February 26, 2025 at 4:23 PM

Reposted by Clara Na

Kyle Lo

@kylelo.bsky.social

the science of LMs should be fully open✨

today @akshitab.bsky.social @natolambert.bsky.social and I are giving our #neurips2024 tutorial on language model development.

everything from data, training, adaptation. published or not, no secrets 🫡

tues, 12/10, 9:30am PT ☕️

neurips.cc/virtual/2024...

NeurIPS Tutorial Opening the Language Model Pipeline: A Tutorial on Data Preparation, Model Training, and AdaptationNeurIPS 2024

neurips.cc

December 10, 2024 at 3:31 PM

Reposted by Clara Na

Casilli

@casilli.bsky.social

How open is “open” AI, really?
It isn’t just about making models reusable. If the origin of data is opaque, if labor is hidden & exploited, if frameworks are dominated by Big Tech, if computational power is mastered by an oligopoly…‘open’ is just a label.

Meredith Whittaker & friends in Nature.

Meredith Whittaker @meredithmeredith.bsky.social · Dec 2

📢NEW: 'Open' AI systems aren't open. The vague term, combined w frothy AI hype is (mis)shaping policy & practice, assuming 'open source' AI democratizes access & addresses power concentration. It doesn't.

@smw.bsky.social, @davidthewid.bsky.social & I correct the record👇
nature.com/articles/s41...

Why ‘open’ AI systems are actually closed, and why this matters - Nature

A review of the literature on artificial intelligence systems to examine openness reveals that open AI systems are actually closed, as they are highly dependent on the resources of a few large corpora...

nature.com

December 3, 2024 at 5:49 PM

Reposted by Clara Na

Marc Marone

@marcmarone.com

I noticed a lot of starter packs skewed towards faculty/industry, so I made one of just NLP & ML students: go.bsky.app/vju2ux

Students do different research, go on the job market, and recruit other students. Ping me and I'll add you!

November 23, 2024 at 7:54 PM

Reposted by Clara Na

Lindia Tjuatja

@lindiatjuatja.bsky.social

💬 Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

November 20, 2024 at 6:08 PM

Clara Na

@clarana.bsky.social

@jaredfern.bsky.social is at 162

November 14, 2024 at 3:30 PM

Clara Na

@clarana.bsky.social

Hi I am at 232 in the back of the riverfront room!

November 14, 2024 at 3:28 PM

Clara Na

@clarana.bsky.social

I'm at EMNLP! Presenting the poster for this paper on Thursday morning (10:30-12), Session F Riverfront Hall, come say hi :)

Clara Na @clarana.bsky.social · Nov 5

Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩‍🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵

November 13, 2024 at 3:08 PM

Reposted by Clara Na

Lindia Tjuatja

@lindiatjuatja.bsky.social

(Hehe first bsky post!) I'll be at #EMNLP2024 💃🌴! Happy to chat about (among other things):
✨linguistically+cognitively motivated evaluation
✨NLP for low-resource+endangered languages
✨figuring out what features of language data LMs are *actually* learning
I'll be presenting two posters 🧵:

November 8, 2024 at 6:39 PM

Clara Na

@clarana.bsky.social

scrolling,,, minimal doom ?!

November 9, 2024 at 12:58 AM

Reposted by Clara Na

Vagrant Gautam

@dippedrusk.com

Understanding “Democratization” in NLP and ML Research - joint work @arjunsubgraph.bsky.social and I co-led with Dietrich Klakow and @zeerak.bsky.social
aclanthology.org/2024.emnlp-m...

Understanding “Democratization” in NLP and ML Research

Arjun Subramonian, Vagrant Gautam, Dietrich Klakow, Zeerak Talat. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.

aclanthology.org

November 8, 2024 at 11:23 PM

Reposted by Clara Na

Maria Antoniak

@mariaa.bsky.social

A starter pack for #NLP #NLProc researchers! 🎉

go.bsky.app/SngwGeS

November 4, 2024 at 10:01 AM

Clara Na

@clarana.bsky.social

Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩‍🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵

November 5, 2024 at 10:37 PM

Reposted by Clara Na

Kyle Lo

@kylelo.bsky.social

I think it’s fucked up that EMNLP 2023 emailed Findings authors on Nov 8 that they *might* have a chance to present at main conf, but also don’t forget to early register by Nov 12. Then only let authors know of *virtual* poster assignment 10 min before early registration closed.

November 13, 2023 at 8:18 AM

Reposted by Clara Na

Maria Antoniak

@mariaa.bsky.social

Not at all surprised to see that junior people support the proposed anonymity changes to the ACL policies.

Speaking for myself and my "early career" goals, the anonymity deadlines are incredibly stressful and (as far as I can tell) not beneficial to me.

ACL anonymity working group

UKP-Cloud - The place for your files @ UKP Lab!

nextcloud.ukp.informatik.tu-darmstadt.de

November 13, 2023 at 3:30 PM

Reposted by Clara Na

Naomi Saphra

@nsaphra.bsky.social

By learning our history, rather than exceptionalizing the current moment, it's easy to discover worthwhile directions for researchers interested in contributing to language model capabilities without access to industry-scale training. Enjoy your research!

November 10, 2023 at 3:15 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news