Andrew Jesson
banner
anndvision.bsky.social
Andrew Jesson
@anndvision.bsky.social
thanks to @yaringal.bsky.social , John P. Cunningham , and David Blei for their help !
December 13, 2024 at 5:26 PM
thank you to my co-authors @velezbeltran.bsky.social and @bleilab.bsky.social
December 13, 2024 at 4:11 PM
we explore two different discrepancies: the negative log likelihood (NLL), and the negative log marginal likelihood (NLML)

the NLL gives p-values that are informative of whether there are enough in-context examples

this can reduce risk in safety critical settings
December 13, 2024 at 4:11 PM
we show that the GPC is an effective OOD predictor on generative image completion tasks using a modified Llama-2 model trained from scratch
December 13, 2024 at 4:11 PM
we show that the GPC is an effective predictor of out-of-capability natural language tasks using pre-trained LLMs
December 13, 2024 at 4:11 PM
we show that the GPC is an effective OOD predictor for tabular data using synthetic data and a modified Llama-2 model trained from scratch
December 13, 2024 at 4:11 PM
the result is the generative predictive p-value

pre-selecting a significance level α to threshold the p-value gives us a predictor of model capacity: the generative predictive check (GPC)
December 13, 2024 at 4:11 PM
problem:

not all generative models (eg, LLMs) lend access to the likelihood and posterior

solution:

we can sample dataset completions from the predictive to simulate sampling from the posterior

and we can estimate the likelihood by conditioning on the completions
December 13, 2024 at 4:11 PM
understanding these nuances is the domain of Bayesian model criticism

posterior predictive checks form a family of model criticism techniques

but for discrepancy functions like the negative log likelihood, PPCs require the likelihood and posterior
December 13, 2024 at 4:11 PM
the posterior is informative about if there are enough in-context examples

but such inferences are made by any model, even misaligned ones

if a model is too flexible, more examples may be needed to specify the task

if it is too specialized, the inferences may be unreliable
December 13, 2024 at 4:11 PM
a model θ defines a joint distribution over datasets x and explanations f

the joint comprises the likelihood over datasets and the prior over explanations

the posterior is a distribution over explanations given a dataset

the posterior predictive gives the model a voice
December 13, 2024 at 4:11 PM
an in-context learning problem comprises a model, a dataset, and a task

knowing when an LLM provides reliable responses is challenging in this setting

there may not be enough in-context examples to specify the task

or the model may just not have the capability to it
December 13, 2024 at 4:11 PM
February 17, 2024 at 10:02 PM