Ben Lipkin
banner
benlipkin.bsky.social
Ben Lipkin
@benlipkin.bsky.social
phd @ mit, research @ genlm, intern @ apple

https://benlipkin.github.io/
Want to use AWRS SMC?

Check out the GenLM control library: github.com/genlm/genlm-...

GenLM supports not only grammars, but arbitrary programmable constraints from type systems to simulators.

If you can write a Python function, you can control your language model!
May 13, 2025 at 2:22 PM
Why does AWRS work?

Formal and empirical runtime analyses tell a fascinating story.

AWRS scales adaptively with the KL divergence between the conditional and base token-level models.

As your LM better understands the constraint, AWRS gets faster.

As the LM struggles, AWRS closes the gap.
May 13, 2025 at 2:22 PM
We tested AWRS SMC on several controlled generation tasks, from text-to-SQL to PDDL goal inference to molecular synthesis.

AWRS SMC outperforms baselines by large margins, e.g., see the jump from 3% -> 53% in the goal inference domain with only ~2.5x clock time overhead.
May 13, 2025 at 2:22 PM
Next, SMC uses the proposed extensions and corresponding weights from AWRS to update importance weights associated with partial sequences (particles).

These particles are resampled proportional to their weights, re-allocating computation towards the most promising sequences.
May 13, 2025 at 2:22 PM
First, AWRS reformulates the token-level inference problem from exact enumeration to adaptive rejection sampling.

This process yields equivalently distributed samples at a fraction of the cost.

AWRS then estimates and propagates an importance weight alongside these samples.
May 13, 2025 at 2:22 PM
So, what can we do?

AWRS SMC is a hierarchical inference framework based on sequential Monte Carlo using a novel stochastic proposal algorithm.

By jointly considering local and global signals, AWRS SMC is both probabilistically sound and sample efficient.

How does it work?
May 13, 2025 at 2:22 PM
Problem B: LCD distorts the distribution.

Consider this simple LM over the tokens `a` and `b` with the constraint that “strings must end with `a`”.

While the distribution on complete strings favors `ba`, autoregressive sampling will favor `ab`.

We don’t want this.
May 13, 2025 at 2:22 PM
Approach 2: Locally constrained decoding (LCD).

At each step, mask the next-token distribution to prevent violations.

Pros: All samples are constraint-satisfying.
Cons: A) Masking a large vocabulary is slow. B) LCD distorts the sampled distribution.

Example:
May 13, 2025 at 2:22 PM
Approach 1: Sample-verify/Best-of-N.

Draw 𝑁 strings from the LM and use the constraint to rank/filter.

Pros: Samples 𝑥 ∝ 𝑃 as 𝑁 grows.
Cons: 𝑁 required to get a target sample scales exp(KL[𝑃||𝑄]). For difficult constraints, this becomes infeasible.

Example:
May 13, 2025 at 2:22 PM
New preprint on controlled generation from LMs!

I'll be presenting at NENLP tomorrow 12:50-2:00pm

Longer thread coming soon :)
April 10, 2025 at 7:19 PM