Lightnews — Scholar-powered news

Andrew Jesson

@anndvision.bsky.social

thanks to @yaringal.bsky.social , John P. Cunningham , and David Blei for their help !

December 13, 2024 at 5:26 PM

Andrew Jesson

@anndvision.bsky.social

thank you to my co-authors @velezbeltran.bsky.social and @bleilab.bsky.social

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

we explore two different discrepancies: the negative log likelihood (NLL), and the negative log marginal likelihood (NLML)

the NLL gives p-values that are informative of whether there are enough in-context examples

this can reduce risk in safety critical settings

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

we show that the GPC is an effective OOD predictor on generative image completion tasks using a modified Llama-2 model trained from scratch

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

we show that the GPC is an effective predictor of out-of-capability natural language tasks using pre-trained LLMs

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

we show that the GPC is an effective OOD predictor for tabular data using synthetic data and a modified Llama-2 model trained from scratch

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

the result is the generative predictive p-value

pre-selecting a significance level α to threshold the p-value gives us a predictor of model capacity: the generative predictive check (GPC)

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

problem:

not all generative models (eg, LLMs) lend access to the likelihood and posterior

solution:

we can sample dataset completions from the predictive to simulate sampling from the posterior

and we can estimate the likelihood by conditioning on the completions

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

understanding these nuances is the domain of Bayesian model criticism

posterior predictive checks form a family of model criticism techniques

but for discrepancy functions like the negative log likelihood, PPCs require the likelihood and posterior

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

the posterior is informative about if there are enough in-context examples

but such inferences are made by any model, even misaligned ones

if a model is too flexible, more examples may be needed to specify the task

if it is too specialized, the inferences may be unreliable

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

a model θ defines a joint distribution over datasets x and explanations f

the joint comprises the likelihood over datasets and the prior over explanations

the posterior is a distribution over explanations given a dataset

the posterior predictive gives the model a voice

December 13, 2024 at 4:11 PM

Andrew Jesson

@anndvision.bsky.social

an in-context learning problem comprises a model, a dataset, and a task

knowing when an LLM provides reliable responses is challenging in this setting

there may not be enough in-context examples to specify the task

or the model may just not have the capability to it