Lightnews — Scholar-powered news

Reposted by Luke Zettlemoyer

@kempnerinstitute.bsky.social

NEW: Luke Zettlemoyer (@lukezettlemoyer.bsky.social) of the University of Washington and Meta AI walks through different approaches to building multimodal foundation models.

Watch the video: youtu.be/vTI4cziw84Q

#NeuroAI2025 #AI #ML #LLMs #NeuroAI

Mixed-modal Language Modeling with Luke Zettlemoyer

YouTube video by Kempner Institute at Harvard University

youtu.be

June 10, 2025 at 8:43 PM

Reposted by Luke Zettlemoyer

Tim Franzmeyer

@timlive.bsky.social

What if LLMs knew when to stop? 🚧

HALT finetuning teaches LLMs to only generate content they’re confident is correct.

🔍 Insight: Post-training must be adjusted to the model’s capabilities.
⚖️ Tunable trade-off: Higher correctness 🔒 vs. More completeness 📝

🧵

June 6, 2025 at 8:22 AM

Reposted by Luke Zettlemoyer

Tomasz Limisiewicz

@tomlim.bsky.social

Excited to continue my research adventure as a postdoc at @uwnlp.bsky.social and Meta! I’ve joined @lukezettlemoyer.bsky.social’s fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!

March 31, 2025 at 2:23 PM

Reposted by Luke Zettlemoyer

Conference on Language Modeling

@colmweb.org

Excited to announce the COLM 2025 keynote speakers: Shirley Ho, Nicholas Carlini, @lukezettlemoyer.bsky.social, and Tom Griffiths!

See you in October in Montreal!

March 10, 2025 at 2:34 PM

Reposted by Luke Zettlemoyer

Carl T. Bergstrom

@carlbergstrom.com

Chatbots don't *want* anything and don't *recognize* anything.

Chatbots, Like the Rest of Us, Just Want to Be Loved

A study reveals that large language models recognize when they are being studied and change their behavior to seem more likable.

www.wired.com

March 6, 2025 at 9:28 PM

Reposted by Luke Zettlemoyer

Carl T. Bergstrom

@carlbergstrom.com

LLMs can't take responsibility for their mistakes. When a human journalist puts their name on AI-written text, they take on that responsibility.

Increasingly I see inaccurate and badly written news stories authored by AI, many of which have actual humans listed as authors or editors.

February 23, 2025 at 11:56 PM

Reposted by Luke Zettlemoyer

Allen School

@uwcse.bsky.social

#AI hallucinations are a problem; #UWAllen Ph.D. student @akariasai.bsky.social may have the answer. She was named a @techreviewjp.bsky.social Innovator Under 35 for her work to make #LLMs more transparent and useful—without making stuff up. #IU35 #AIforGood
news.cs.washington.edu/2025/01/07/w...

‘Working to solve global problems’: Allen School Ph.D. student Akari Asai named one of MIT Technology Review’s Innovators Under 35 Japan - Allen School News

Despite their growing potential and increasing popularity, large language models (LLMs) often produce responses that are factually inaccurate or nonsensical, also known as hallucinations. Allen School...

news.cs.washington.edu

January 8, 2025 at 10:58 PM

Reposted by Luke Zettlemoyer

Mark Riedl

@markriedl.bsky.social

My Keynote Talk entitled “Dungeons and DQNs: The Serious Quest for Open Ended Role Playing Game Playing Agents” is now online.

youtu.be/EiurL9eyUNc

In which I might or might not have said “I’m working to take the ‘ick’ out of ‘agentic’”

Dungeons and DQNs: The Serious Quest for Open Ended Role Playing Game Playing Agents

YouTube video by AIIDE Conference

youtu.be

January 9, 2025 at 1:15 AM

Reposted by Luke Zettlemoyer

Kyle Lo

@kylelo.bsky.social

kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡

🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into:

🚖 stable pretrain recipe
🚔 lr anneal 🤝 data curricula 🤝 soups
🚘 tulu post-train recipe
🚜 compute infra setup

👇🧵

January 3, 2025 at 4:02 PM

Reposted by Luke Zettlemoyer

Lovisa Hagström

@lovhag.bsky.social

📚 How good are language models at utilising contexts in RAG scenarios?
We release 🧙🏽‍♀️DRUID to facilitate studies of context usage in real-world scenarios.
arxiv.org/abs/2412.17031

w/ @saravera.bsky.social, H.Yu, @rnv.bsky.social, C.Lioma, M.Maistro, @apepa.bsky.social and @iaugenstein.bsky.social ⭐️

A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) helps address the limitations of the parametric knowledge embedded within a language model (LM). However, investigations of how LMs utilise retrieved information o...

arxiv.org

January 2, 2025 at 7:15 AM

Reposted by Luke Zettlemoyer

Artidoro Pagnoni

@artidoro.bsky.social

🚀 Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🤯
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...

December 13, 2024 at 4:53 PM

Reposted by Luke Zettlemoyer

Ai2

@ai2.bsky.social

Remember Molmo? The full recipe is finally out!

Training code, data, and everything you need to reproduce our models. Oh, and we have updated our tech report too!

Links in thread 👇

Overview of PixMo and its relation to Molmo's ability. PixMo's captions data enables Molmo's fine-grained understanding; PixMo's AskModelAnything enables Molmo's user interaction; PixMo's pointing data enables Molmo's pointing and counting; PixMo's synthetic data enables Molmo's visual skills.

December 9, 2024 at 6:34 PM

Reposted by Luke Zettlemoyer

Peter Henderson

@peterhenderson.bsky.social

Check out my new piece on AI terms of use restrictions w/ Mark Lemley ( @marklemley.bsky.social ).

There's been a recent stir about terms of use restrictions on AI outputs & models. We dig into the legal analysis, questioning their enforceability.

Link: papers.ssrn.com/sol3/papers....

Artificial intelligence (AI) model creators commonly attach restrictive terms of use to both their models and their outputs. These terms typically prohibit activities ranging from creating competing AI models to spreading disinformation. Often taken at face value, these terms are positioned by companies as key enforceable tools for preventing misuse, particularly in policy dialogs. But are these terms truly meaningful? There are myriad examples where these broad terms are regularly and repeatedly violated. Yet except for some account suspensions on platforms, no model creator has actually tried to enforce these terms with monetary penalties or injunctive relief. This is likely for good reason: we think that the legal enforceability of these licenses is questionable.

This Article systematically assesses of the enforceability of AI model terms of use and offers three contributions. First, we pinpoint a key problem: the artifacts that they protect, namely model weights and model outputs, are largely not copyrightable, making it unclear whether there is even anything to be licensed. Second, we examine the problems this creates for other enforcement. Recent doctrinal trends in copyright preemption may further undermine state-law claims, while other legal frameworks like the DMCA and CFAA offer limited recourse. Anti-competitive provisions likely fare even worse than responsible use provisions. Third, we provide recommendations to policymakers. There are compelling reasons for many provisions to be unenforceable: they chill good faith research, constrain competition, and create quasi-copyright ownership where none should exist. There are, of course, downsides: model creators have fewer tools to prevent harmful misuse. But we think the better approach is for statutory provisions, not private fiat, to distinguish between good and bad uses of AI, restricting the latter.

December 10, 2024 at 12:39 AM

Reposted by Luke Zettlemoyer

Akari Asai

@akariasai.bsky.social

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs.
My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵

December 4, 2024 at 1:26 PM

Reposted by Luke Zettlemoyer

Ofir Press

@ofirpress.bsky.social

I'm on the academic job market!
I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities & other useful+challenging tasks.
I do this by building frontier-pushing benchmarks and agents that do well on them.
See you at NeurIPS!

December 4, 2024 at 4:52 PM

Reposted by Luke Zettlemoyer

Abhishek Gupta

@abhishekunique7.bsky.social

I'm excited about scaling up robot learning! We’ve been scaling up data gen with RL in realistic sims generated from crowdsourced videos. Enables data collection far more cheaply than real world teleop. Importantly, data becomes *cheaper* with more environments and transfers to real robots! 🧵 (1/N)

December 5, 2024 at 2:13 AM

Reposted by Luke Zettlemoyer

Mechanical Dirk

@mechanicaldirk.bsky.social

We just updated the OLMo repo at github.com/allenai/OLMo!
There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models.
In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!

GitHub - allenai/OLMo: Modeling, training, eval, and inference code for OLMo

Modeling, training, eval, and inference code for OLMo - allenai/OLMo

github.com

December 2, 2024 at 8:13 PM

Reposted by Luke Zettlemoyer

Simona Liao

@simonaliao.bsky.social

Hi everyone, I am excited to share our large-scale survey study with 800+ researchers, which reveals researchers’ usage and perceptions of LLMs as research tools, and how the usage and perceptions differ based on demographics.

See results in comments!

🔗 Arxiv link: arxiv.org/abs/2411.05025

LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions

The rise of large language models (LLMs) has led many researchers to consider their usage for scientific work. Some have found benefits using LLMs to augment or automate aspects of their research pipe...

arxiv.org

December 2, 2024 at 7:45 PM

Reposted by Luke Zettlemoyer

Laura

@lauraruis.bsky.social

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

November 20, 2024 at 4:35 PM

Reposted by Luke Zettlemoyer

Jeremy Howard

@howard.fm

We should make sure that only really big companies can afford to pay really big copyright holders to access the data needed to do stuff with AI, and keep everyone else out.

Wouldn’t that be just super?

November 28, 2024 at 5:04 AM

Reposted by Luke Zettlemoyer

Jiatao Gu

@jgu32.bsky.social

I am seeking multiple PhD students passionate about Generative Intelligence and its applications in empowering AI agents to interact with the physical world to join us at UPenn CIS for the 2024-2025 academic cycle. You can find more information at www.cis.upenn.edu/graduate/pro...

Doctoral Program

www.cis.upenn.edu

November 27, 2024 at 1:18 AM

Reposted by Luke Zettlemoyer

Luca Soldaini 🎀

@soldaini.net

OLMo 2 is out 🥳 7B and 13B trained on 5T tokens, and meticulousy instruction tuned using Tulu 3 recipe.

Simply the best fully open models yet.

Really proud of the work & the amazing team at
@ai2.bsky.social

November 26, 2024 at 9:12 PM

Reposted by Luke Zettlemoyer

Kyle Lo

@kylelo.bsky.social

Excited to share OLMo 2!

🐟 7B and 13B weights, trained up to 4-5T tokens, fully open data, code, etc
🐠 better architecture and recipe for training stability
🐡 staged training, with new data mix Dolmino🍕 added during annealing
🦈 state-of-the-art OLMo 2 Instruct models

#nlp #mlsky

links below👇

A scatter plot comparing language models by performance (y-axis, measured in average performance on 10 benchmarks) versus training computational cost (x-axis, in approximate FLOPs). The plot shows OLMo 2 models (marked with stars) achieving Pareto-optimal efficiency among open models, with OLMo-2-13B and OLMo-2-7B sitting at the performance frontier relative to other open models like DCLM, Llama 3.1, StableLM 2, and Qwen 2.5. The x-axis ranges from 4x10^22 to 2x10^24 FLOPs, while the y-axis ranges from 35 to 70 benchmark points.

November 26, 2024 at 8:59 PM

Reposted by Luke Zettlemoyer

xjdr

@xjdr.bsky.social

very interesting work and it reminds me a bit of this paper. Tokenizers and ROPE must die. after samplers, i am on to those next ...
arxiv.org/abs/2407.036...

November 25, 2024 at 2:20 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news