Lightnews — Scholar-powered news

Nur Lan

@nurikolan.bsky.social

110 followers 220 following 16 posts

Post-doc cogsci, linguistics, AI @ ENS Paris

https://0xnurl.github.io/

Posts Replies Media Videos

Nur Lan

@nurikolan.bsky.social

We build an optimal aⁿbⁿ LSTM based on @gail_w et al. (2018) and find that it is not an optimum of the standard cross-entropy loss, even with regularization terms that are expected to lead good generalization (L1/L2).

Meta-heuristics (early stop, dropout) don't help either.

2/3

Training an LSTM for aⁿbⁿ using the cross-entropy loss consistently leads to imperfect counting, while using Minimum Description Length (MDL) leads to a provably perfect counting net.

February 17, 2024 at 6:14 PM

Nur Lan

@nurikolan.bsky.social

🧪🗞️ New paper with Emmanuel Chemla and @rkatzir.bsky.social:

Neural nets offer good approximation but consistently fail to generalize perfectly, even when perfect solutions are proved to exist.

We check whether the culprit might be their training objective.

arxiv.org/abs/2402.10013

We build an optimal aⁿbⁿ LSTM based on Weiss et al. (2018), and find that it does not lie at optima of standard loss terms (cross-entropy with/without L1/L2).
Moving to the Minimum Description Length objective (MDL) aligns the network with an optimum of the loss.

February 17, 2024 at 6:13 PM

Nur Lan

@nurikolan.bsky.social

Douglas Hofstadter on toy tasks, in Waking Up from the Boolean Dream, 1982

There is in Al today a tendency toward flashy, splashy domains--that is, toward developing programs that can do such things as medical diagnosis, geological consultation (for oil prospecting), designing of experiments in molecular biology, molecular spectroscopy, configuring of large computer systems, designing ofVLSI circuits, and on and on.Yet there is no program that has common sense; no program that learns things that it has not been explicitly taughthow to learn; no program that can recover gracefully from itsown errors. The "artificial expertise" programs that do exist are rigid, brittle, inflexible. Like chess programs, they may serve a useful intellectual or even practical purpose, butdespite much fanfare, they are not shedding much lighto nhuman intelligence. Mostly, they are being developed simply because various agencies or industries fund them. This doesnotfollow the traditional pattern of basic science. That pattern is to try to isolate a phenomenon, to reduce it to its si

November 23, 2023 at 10:09 AM

Nur Lan

@nurikolan.bsky.social

We take this to show that recent claims about LLMs undermining the argument from the poverty of the stimulus are premature.

November 21, 2023 at 4:18 PM

Nur Lan

@nurikolan.bsky.social

We now test a much larger battery of models on important syntactic phenomena: across-the-board movement and parasitic gaps.

Using cases where humans have clear acceptability judgements, we find that all models systematically fail to assign higher probabilities to grammatical continuations.

Surprisal values for the sentence "I know who John met recently and is going to annoy soon", and its ungrammatical variant "I know who John met recently and is going to annoy you soon". GPT-2 and GPT-j wrongly assign higher probabilities to the ungrammatical continuation.

November 21, 2023 at 4:15 PM

Nur Lan

@nurikolan.bsky.social

⚡ 🗞️ New up-to-date version of Large Language Models and the Argument from the Poverty of the Stimulus, work with Emmanuel Chemla and @rkatzir.bsky.social:

ling.auf.net/lingbuzz/006...

Accuracy figure for large language models tested on across-the-board sentences

November 21, 2023 at 4:14 PM

Nur Lan

@nurikolan.bsky.social

We find that minimizing the algorithmic complexity of the net (w/ MDL) results in better generalization, using significantly less data.

The second-best net, a Memoy-Augmented RNN by Suzgun et al., shows that expressive power is important for GI, but isn't enough for little data.

October 2, 2023 at 9:43 AM

Nur Lan

@nurikolan.bsky.social

We introduce BLISS - a Benchmark for Language Induction from Small Sets.

The benchmark assigns a generalization index to a model based on how much it generalizes from how little training data.

The initial release includes languages such as aⁿbⁿ, aⁿbᵐcⁿ⁺ᵐ, and Dyck 1-2.

October 2, 2023 at 9:40 AM

Nur Lan

@nurikolan.bsky.social

Grammar induction (GI) involves learning a formal grammar from a finite, often small, sample of a typically infinite language. To do this, a model must be able to generalize well.

Humans do this remarkably well based on very little data. What about neural nets?

October 2, 2023 at 9:39 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news