Lightnews — Scholar-powered news

Reposted by Sergey Feldman

Jonathan Bragg

@jbragg.bsky.social

Agent benchmarks don't measure true *AI* advances

We built one that's hard & trustworthy:
👉 AstaBench tests agents w/ *standardized tools* on 2400+ scientific research problems
👉 SOTA results across 22 agent *classes*
👉 AgentBaselines agents suite

🆕 arxiv.org/abs/2510.21652

🧵👇

AstaBench with abstract measurement icons

November 6, 2025 at 5:01 PM

Reposted by Sergey Feldman

Ai2

@ai2.bsky.social

Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍

Screenshot of the Ai2 Paper Finder interface

March 26, 2025 at 7:07 PM

Reposted by Sergey Feldman

Ai2

@ai2.bsky.social

Hope you’re enjoying Ai2 ScholarQA as your literature review helper 🥳 We’re excited to share some updates:

🗂️ You can now sign in via Google to save your query history across devices and browsers.
📚 We added 108M+ paper abstracts to our corpus - expect to get even better responses!

More below…

March 5, 2025 at 6:21 PM

Reposted by Sergey Feldman

Ai2

@ai2.bsky.social

Can AI really help with literature reviews? 🧐
Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth and contextual answers with table comparisons and expandable sections 💡
Try it now: scholarqa.allen.ai

January 21, 2025 at 7:31 PM

Sergey Feldman

@sergeyf.bsky.social

January 8, 2025 at 6:00 AM

Reposted by Sergey Feldman

Scott McGrath

@smcgrath.phd

Building off the story I shared yesterday about fighting potential Insurance Company AI with AI: Claimable uses AI to tackle insurance claim denials. With an 85% success rate, it generates tailored appeals via clinical research and policy analysis. 🩺 #HealthPolicy

The CEO using AI to fight insurance-claim denials says he wants to remove the 'fearfulness' around getting sick

Claimable has helped patients file hundreds of health-insurance appeals. Its CEO says its success rate of overturning denials is about 85%.

www.businessinsider.com

December 13, 2024 at 4:17 PM

Sergey Feldman

@sergeyf.bsky.social

Super awesome paper that directly addresses questions I've had for a while: arxiv.org/abs/2412.02674

Their experiments:

(1) They get 128 responses from a LLM for some prompt. p = 0.9, t = 0.7, max length of 512 and 4-shot in-context samples

1/n

December 13, 2024 at 3:35 AM

Reposted by Sergey Feldman

Katie Keith

@katakeith.bsky.social

Check out our #NeurIPS2024 poster (presented by my collaborators Jacob Chen and Rohit Bhattacharya) about “Proximal Causal Inference With Text Data” at 5:30pm tomorrow (Weds)!

neurips.cc/virtual/2024...

December 11, 2024 at 1:10 AM

Reposted by Sergey Feldman

Jordan Hoffman, your friend.

@jhoffman.bsky.social

Can you imagine how good BESTERSHIRE sauce tastes?!?!

December 9, 2024 at 8:39 PM

Reposted by Sergey Feldman

Oregon 🕎🎲

@oregonthedm.bsky.social

Windows has issue:

Person: fuck this I'm going to Linux

Narrator: and they quickly learned to hate two operating systems.

November 26, 2024 at 4:48 PM

Sergey Feldman

@sergeyf.bsky.social

Here are some research questions I'd like to get answers to. We are using LLMs to make training data for smaller, portable search or retrieval relevance models. (thread)

November 22, 2024 at 6:00 PM

Reposted by Sergey Feldman

John Penniman

@historiographos.bsky.social

I will never recover from this student email.

Image of an email from a student asking if sources "from the late 1900s" are acceptable.

November 27, 2023 at 9:48 PM

Sergey Feldman

@sergeyf.bsky.social

www.semanticscholar.org/paper/Ground...

I really like this paper. They study whether LLMs do reasonable things like ask follow-up questions and acknowledge what the users are saying. The answer is "not really".

November 20, 2023 at 11:22 PM

Reposted by Sergey Feldman

Maria Antoniak

@mariaa.bsky.social

An actually useful task for GPT-4: formatting my bibliography.

October 24, 2023 at 4:03 PM

Reposted by Sergey Feldman

Maria Antoniak

@mariaa.bsky.social

Do crowdworkers use ChatGPT to write their responses? I'm still not sure, but when I asked on Reddit, I got a flood of fascinating responses from the workers themselves, including some practical tips for researchers looking to prevent this.

Do you use ChatGPT or similar tools to complete tasks? : r/ProlificAc

www.reddit.com

October 20, 2023 at 5:31 PM

Sergey Feldman

@sergeyf.bsky.social

Here's an interview with me about my work at alongside.care where I argue that everyone should take every August off. medium.com/authority-ma... #mlsky

October 19, 2023 at 3:59 PM

Reposted by Sergey Feldman

Luca Soldaini 🎀

@soldaini.net

yayyy @ai2.bsky.social is on bsky 🥹

October 13, 2023 at 2:57 AM

Sergey Feldman

@sergeyf.bsky.social

"Why Do We Need Weight Decay in Modern Deep Learning?"

overparameterized deep networks -> WD changes enhances the implicit regularization of SGD

underparameterized models trained with nearly online SGD -> WD balances bias/variance and lowers training loss.

#mlsky

arxiv.org/abs/2310.04415

Why Do We Need Weight Decay in Modern Deep Learning?

Weight decay is a broadly used technique for training state-of-the-art deep networks, including large language models. Despite its widespread usage, its role remains poorly understood. In this...

arxiv.org

October 11, 2023 at 9:58 PM

Reposted by Sergey Feldman

Kyle Lo

@kylelo.bsky.social

manually curating a list of Twitter-Bluesky account mappings of NLP HCI folks, thinking it’ll be helpful for newcomers to Bluesky with migration friction.

sinking way too much time into defining what qualifies as NLP HCI accounts.

realizing I’m the sole annotator for my own annotation task 😵

October 11, 2023 at 2:08 AM

Reposted by Sergey Feldman

Vicente Valentim

@valentimvicente.bsky.social

I think this is worth noting. Your language preferences on here may keep you from seeing posts containing only one word that is not in your chosen languages. This may keep you from seeing posts about feeds you’re following (e.g., pokisky, behavioursky).

Turning all languages off fixes the issue.

Maaike @maaikeverbruggen.bsky.social · Sep 24

Yes if you turn them all off you see it all

September 24, 2023 at 9:14 AM

Reposted by Sergey Feldman

Kyle Lo

@kylelo.bsky.social

Don't forget to apply by *Oct 15* for AI2 research internships!

Interested in language models of science, evaluating AI-generated text, challenging retrieval settings, and human-AI collaborative reading/writing?

Come work with meeee! 😸

Learn more: kyleclo.github.io/mentorship

mentorship | Kyle Lo

Researcher at AI2 in Seattle. NLP + HCAI for scholars and scientists.

kyleclo.github.io

October 6, 2023 at 9:12 PM

Reposted by Sergey Feldman

Alex Rubinsteyn

@alexr.bsky.social

In what scenarios / model classes has double descent not been observed?

(eg adding trees to an RF has marginal impact on accuracy, feature selection gets worse past the optimal number of features, &c)

October 6, 2023 at 10:17 PM

Sergey Feldman

@sergeyf.bsky.social

XGBoost @ v2.0. I love ensembles of GBDTs - feels like a mature technology where new features are not so exciting. Still it's nice that they are adding multi-label trees out-of-the-box. LightGBM (my preferred package) still can't do it.

Do GBDTs work well in high dimensions or for text data?

October 5, 2023 at 5:21 PM

Sergey Feldman

@sergeyf.bsky.social

So hard!!!

A screenshot of an article about Linda Yaccarino (CEO of X) in the financial times. She's laying there in a read dress with bows with a slight smile. The quote from her is "It's hard on me"

September 27, 2023 at 5:29 AM

Reposted by Sergey Feldman

Alex Rubinsteyn

@alexr.bsky.social

Made an MLSky feed for machine learning research:

bsky.app/profile/did:...

Tag your posts with #mlsky

I'll probably also add some ML jargon keywords later.

September 15, 2023 at 7:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news