Vivien
vivien000.bsky.social
Vivien
@vivien000.bsky.social
For all the details, explore the blog post, as well the accompanying Python notebook and the technical appendix: vivien000.github.io/blog/journal...
Accelerating LLM Code Generation Through Mask Store Streamlining
Structured text generation techniques enforcing context-free grammar (CFG) constraints (Willard & Louf, 2023; Gerganov & et. al., 2024; Lundberg & Ribeiro, 2023; Geng et al., 2024; Beurer-Kellner et a...
vivien000.github.io
January 10, 2025 at 4:57 PM
The results: a 10x smaller mask store for the Python grammar and the Mistral 7B tokenizer.
Experiments empirically confirm the streamlined mask store produces exactly the same masks as the original mask store and that the masks are neither too lenient nor too strict.
January 10, 2025 at 4:57 PM
For are_jointly_legal, we can take advantage of the known values of is_always_legal and exploit the fact that there may be sets of mutually interchangeable terminals in the grammar.
January 10, 2025 at 4:57 PM
For is_always_legal, I propose to use a specific pushdown automaton based on the grammar. Determining the values of is_always_legal is equivalent to identifying and analyzing paths through this pushdown automaton.
January 10, 2025 at 4:57 PM
In contrast, is_always_legal and are_jointly_legal are not computable for an arbitrary CFG. We can only hope to obtain some values of these functions.
January 10, 2025 at 4:57 PM
Computing is_never_legal amounts to checking whether the intersection of the context-free language and a regular language is empty, which can be done with standard CFG algorithms.
January 10, 2025 at 4:57 PM
To systematically identify such patterns and streamline the mask store, we would ideally have access to three functions: is_always_legal, is_never_legal and are_jointly_legal. Can we compute them?
January 10, 2025 at 4:57 PM
For example, with the Python grammar, a `def` terminal can always be followed by a `name` terminal, but never by an `async` or `class` terminal. Furthermore, a `+` terminal and a `-` terminal are interchangeable in the Python syntax. Therefore `-` is acceptable whenever `+` is.
January 10, 2025 at 4:57 PM
In the blog post, I propose streamlining the mask store by reducing redundant checks. Key insight: some checks don’t depend on the already-generated text.
January 10, 2025 at 4:57 PM
At inference time, the parser examines the mask store for the current lexer state and checks whether the new terminals are compatible with the generated string. This makes CFG constraints much slower to process than regex constraints.
January 10, 2025 at 4:57 PM
A useful data structure in this context is the mask store. A mask store lists, for each lexer state, which sequences of new terminals should be tested and which tokens should be allowed if the tests are positive.
January 10, 2025 at 4:57 PM
Grammar-constrained decoding techniques like Domino or Outlines leverage a lexer and an incremental parser. The lexer ensures that the tokens can form a sequence of terminals while the parser checks that this sequence complies with the CFG.
January 10, 2025 at 4:57 PM