Lightnews — Scholar-powered news

anish144.bsky.social

@anish144.bsky.social

We will be presenting this work at #ICML2025 and are happy to discuss it further.

🗓️: Tue 15 Jul 4:30 p.m. PDT
📍: East Exhibition Hall A-B #E-1912

Joint 1st author: @ruby-sedgwick.bsky.social.
With: Avinash Kori, Ben Glocker, @mvdw.bsky.social.

🧵14/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

Finally, we note the flexibility of our model comes at the cost of more difficult optimisation. However, random restarts and choosing the model with the highest score reliably improve the structure recovery metrics (commonly done in GPs).

🧵13/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

We also test our method on semi-synthetic data generated from the Syntren gene regulatory network simulator.

🧵12/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

When data are generated from an identifiable model (ANM), our more flexible model performs as well as an ANM restricted Bayesian model (CGP). Both Bayesian models again outperform other non-Bayesian approaches - even those that assume the correct ANM assumption.

🧵11/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

With larger number of variables (50), where the discrete search blows up, and with complex data, our approach performs well. SDCD uses the same acyclicity regulariser but uses maximum likelihood with NNs. This shows the advantage of the Bayesian approach.

🧵10/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

We first test on data generated from our model itself and where discrete model selection is tractable (3 variables). Here, we show that while the discrete model (DGP-CDE) recovers the true structure reliably, our continuous approximation (CGP-DCE) results in higher error.

🧵9/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

We enforce acyclicity in the adjacency by adding an acyclicity constraint to the optimisation. Variational inference trains the rest of the parameters.

The final objective returns the adjacency of the causal structure that maximises the posterior.

🧵8/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

Therefore, we can construct an adjacency matrix from the kernel hyperparameters. This amounts to Automatic Relevance Determination: maximising the marginal likelihood uncovers the dependency structure among the variables. However, the learnt adjacency must be acyclic.

🧵7/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

Next, we construct a latent variable Gaussian process model that can model non-Gaussian densities with inputs according to a causal graph. To continuously parametrise the space of graphs, we note that the kernel hyperparameters control input dependence.

🧵6/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

We first show that the guarantees of Bayesian model selection (BMS) hold in the multivariate case: 1) when the underlying model is identifiable, BMS identifies the true DAG, 2) for more flexible models, graphs stay distinguishable.

🧵5/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

However, naive Bayesian model selection scales poorly because DAGs grow exponentially with no. of variables.

We propose a continuous Bayesian model selection approach that scales and allows for using more flexible assumptions.

🧵4/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

While current causal discovery impose unrealistic model restrictions to ensure identifiability, Bayesian models relax identifiability but allow for causal and more realistic assumptions, yielding performance gains: arxiv.org/abs/2306.02931

🧵3/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

Bayesian models encode soft restrictions in the form of priors. These priors also allow for encoding causal assumptions. Mainly that causal mechanisms do not inform each other. This is achieved by simply ensuring that the prior factorises over the mechanisms.

🧵2/14

July 10, 2025 at 6:07 PM

anish144.bsky.social

@anish144.bsky.social

Excited to be presenting this work at #ICLR2025. Please do reach out if you are interested in a similar space!

🗓️: Hall 3 + Hall 2B #471
🕐: Fri 25 Apr, 3 p.m.
📜: openreview.net/forum?id=eeJ...

This was a great collaboration w/ @mashtro.bsky.social, James Requeima, @mvdw.bsky.social

A Meta-Learning Approach to Bayesian Causal Discovery

Discovering a unique causal structure is difficult due to both inherent identifiability issues, and the consequences of finite data. As such, uncertainty over causal structures, such as those...

openreview.net

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

Why did that work? We are approximating the posterior of a causal model (from which data is generated), which may be different to the data generating process. Improving the causal model (more flexible, wider prior), and increasing the capacity of the neural process can help 14/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

What if we don't know the data distribution? Our approach here is to encode a "wide prior", training on mixtures of all possible models (that we can think of). We show that this approach leads to good performance on datasets whose generation process was unknown at training. 13/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

Next, we test with higher nodes (20), denser graphs, and more complicated functions. Here, we show that our model outperforms other baselines. Notably, a single model that is trained on all the data (labelled BCNP All Data) does not lose performance on specific datasets. 12/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

We first show that our model outputs reasonable posterior samples: 2 node, graph with single edge, where the underlying data is not identifiable. Here we can see that the AVICI model, that does not correlate terms of the adjacency matrix, fails to output reasonable samples. 11/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

We test against two baseline types: 1) Posterior approx. via marginal likelihood (DiBS, BayesDAG). 2) NP-like methods finding single structures, that can be used to obtain posterior samples, but missing key properties of the posterior (AVICI, CSIvA). 10/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

The loss, targeting the KL divergence, simplifies to maximising the log probability of the true causal graph under our model. The final scheme: A model that efficiently outputs samples of causal structures approximating the true posterior — with just a forward pass! 9/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

Our decoder uses lower triangular-permutation matrices (A, Q) to construct DAGs. A Gumbel-Sinkhorn distribution is parameterised, from which permutations (Q) can be sampled. The representation is further processed to parameterise the lower triangular matrix (A). 8/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

We embed each node sample pair, and append a query vector of 0s to the sample axis. Our encoder alternates between attention over samples and nodes to preserve equivariance. We then perform cross attention with the query vector to encode permutation invariance over samples. 7/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

What does our model look like? We encode key properties of the posterior: 1) Permutation Invariance with respect to the samples, 2) Permutation equivariance with respect to nodes, 3) Correlation between adjacency elements. We do with a transformer encoder-decoder structure. 6/15

April 19, 2025 at 5:39 PM

anish144.bsky.social

@anish144.bsky.social

Our training objective reflects this: we minimise the KL between the true posterior and the neural process. The key property is that we only require samples of data and the true causal graph. This data forms the "prior", which can be synthetic or can be from real examples. 5/15

April 19, 2025 at 5:39 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news