Lightnews — Scholar-powered news

Reposted by Nur Lan

Matan Abudy

@matanabudy.bsky.social

📄 New paper: “A Minimum Description Length Approach to Regularization in Neural Networks”
with Orr Well, Emmanuel Chemla, @rkatzir.bsky.social and @nurikolan.bsky.social

We explore why neural networks often struggle with simple, structured tasks.
Spoiler: our regularizers might be the problem.

🧵

May 24, 2025 at 4:01 PM

Reposted by Nur Lan

Imry Ziv

@imryziv.bsky.social

LLMs are bad theories of human linguistic cognition but can be useful proxies for less bad ones. Here's how and why it might be useful:
arxiv.org/abs/2502.07687

Joint paper with @nurikolan.bsky.social, Emmanuel Chemla and @rkatzir.bsky.social

Large Language Models as Proxies for Theories of Human Linguistic Cognition

We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relat...

arxiv.org

February 12, 2025 at 10:40 PM

Nur Lan

@nurikolan.bsky.social

⭐️🗞️ Accepted to ACL 2024 main conference! #ACL2024NLP

Neural nets can in theory learn formal languages such as aⁿbⁿ & Dyck. Yet no one ever finds such nets using standard techniques. Why?

We suggest that the culprit might have been the objective function all along 👇

arxiv.org/abs/2402.10013

Nur Lan @nurikolan.bsky.social · Feb 17

🧪🗞️ New paper with Emmanuel Chemla and @rkatzir.bsky.social:

Neural nets offer good approximation but consistently fail to generalize perfectly, even when perfect solutions are proved to exist.

We check whether the culprit might be their training objective.

arxiv.org/abs/2402.10013

We build an optimal aⁿbⁿ LSTM based on Weiss et al. (2018), and find that it does not lie at optima of standard loss terms (cross-entropy with/without L1/L2).
Moving to the Minimum Description Length objective (MDL) aligns the network with an optimum of the loss.

June 17, 2024 at 6:54 PM

Nur Lan

@nurikolan.bsky.social

🧪🗞️ New paper with Emmanuel Chemla and @rkatzir.bsky.social:

Neural nets offer good approximation but consistently fail to generalize perfectly, even when perfect solutions are proved to exist.

We check whether the culprit might be their training objective.

arxiv.org/abs/2402.10013

February 17, 2024 at 6:13 PM

Nur Lan

@nurikolan.bsky.social

🎲 GPTs can't count – new simple demo of LLMs' very partial arithmetics.

github.com/0xnurl/gpts-...

GitHub - 0xnurl/gpts-cant-count: Demo of even the most advanced LLMs' inability to handle basic arit...

Demo of even the most advanced LLMs' inability to handle basic arithmetic. - GitHub - 0xnurl/gpts-cant-count: Demo of even the most advanced LLMs' inability to handle basic arithmetic.

github.com

December 15, 2023 at 9:14 PM

Nur Lan

@nurikolan.bsky.social

Douglas Hofstadter on toy tasks, in Waking Up from the Boolean Dream, 1982

There is in Al today a tendency toward flashy, splashy domains--that is, toward developing programs that can do such things as medical diagnosis, geological consultation (for oil prospecting), designing of experiments in molecular biology, molecular spectroscopy, configuring of large computer systems, designing ofVLSI circuits, and on and on.Yet there is no program that has common sense; no program that learns things that it has not been explicitly taughthow to learn; no program that can recover gracefully from itsown errors. The "artificial expertise" programs that do exist are rigid, brittle, inflexible. Like chess programs, they may serve a useful intellectual or even practical purpose, butdespite much fanfare, they are not shedding much lighto nhuman intelligence. Mostly, they are being developed simply because various agencies or industries fund them. This doesnotfollow the traditional pattern of basic science. That pattern is to try to isolate a phenomenon, to reduce it to its si

November 23, 2023 at 10:09 AM

Nur Lan

@nurikolan.bsky.social

⚡ 🗞️ New up-to-date version of Large Language Models and the Argument from the Poverty of the Stimulus, work with Emmanuel Chemla and @rkatzir.bsky.social:

ling.auf.net/lingbuzz/006...

Accuracy figure for large language models tested on across-the-board sentences

November 21, 2023 at 4:14 PM

Nur Lan

@nurikolan.bsky.social

How well can neural networks generalize from how little data?

New work with Emmanuel Chemla and Roni Katzir:

Benchmark:
github.com/taucompling/...

Paper:
aclanthology.org/2023.clasp-1...

🧵

GitHub - taucompling/bliss: 🧘 BLISS – a Benchmark for Language Induction from Small Sets

🧘 BLISS – a Benchmark for Language Induction from Small Sets - GitHub - taucompling/bliss: 🧘 BLISS – a Benchmark for Language Induction from Small Sets

github.com

October 2, 2023 at 9:38 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news