Lightnews — Scholar-powered news

Laurent Jacob

@laurentjacob.bsky.social

The decisions for LEGEND are out: legend2025.sciencesconf.org/data/book_le...

I'm really looking forward to hearing these 21 exciting presentations (and additional 30 posters) next December.

If you want to attend too, registration is open until October 17th through legend2025.sciencesconf.org

Exploring the space of self-reproducing RNA using generative models, Martin Weigt

Exploring the archaic introgression landscape of admixed populations through
joint ancestry inference, Jazeps Medina Tretmanis [et al.]

Predicting natural variation in the yeast phenotypic landscape with machine
learning, Sakshi Khaiwal [et al.]

Phylodynamic modeling with unsupervised Bayesian neural networks, Marino
Gabriele [et al.]

Likelihood-free inference of phylogenetic tree posterior distributions, Luc Blas-
sel [et al.]

Generative continuous time model reveals epistatic signatures in protein evolu-
tion, Barrat-Charlaix Pierre

Neural posterior estimation for high-dimensional genomic data from complex pop-
ulation genetic models, Jiseon Min [et al.]

A differentiable model for detecting diversifying selection directly from alignments
in large-scale bacterial datasets, Leonie Lorenz [et al.]

Detecting interspecific positive selection using transformers, Charlotte West [et al.]

Predicting Multiple Sequence Alignment Uncertainty via Machine Learning, Lucia
Martin-Fernandez [et al.]

Graph Neural Networks for Likelihood-Free Inference in Diversification Mod-
els, Amélie Leroy [et al.]

Popformer: learning general signatures of genetic variation and natural selection
with a self-supervised transformer, Leon Zong [et al.]

PRIVET: PRIVacy metric based on Extreme value Theory, Antoine Szatkownik [et
al.]

Generative models for inferring the evolutionary history of the malaria vector
Anopheles gambiae, Amelia Eneli [et al.]

Language Models Outperform Supervised-Only Approaches for Conserved Ele-
ment Comprehension, Eyes Robson [et al.]

Identification and Classification of Orphan Genes, Spurious Orphan Genes, and
Conserved Genes from the human microbiome, Chen Chen

Neural Simulation-based inference of demography and selection, Francisco De
Borja Campuzano Jiménez [et al.]

Species Identification and aDNA Read Mapping Using k-mer Embeddings, Filip
Thor [et al.]

Contrastive Learning for Population Structure and Trait Prediction, Filip Thor [et
al.]

Protein and genomic language models chart a vast landscape of antiphage de-
fenses, Mordret Ernest

The Phylogenomics and Sparse Learning of Trait Innovations, Gaurav Diwan [et
al.]

October 8, 2025 at 11:04 AM

Laurent Jacob

@laurentjacob.bsky.social

Come hear about the latest advances in the field and discuss your own work at Centre Paul Langevin in beautiful Aussois.

People having breakfast in front of the Alps in the Centre Paul Langevin.

February 24, 2025 at 8:58 AM

Laurent Jacob

@laurentjacob.bsky.social

Burak Yelmen from the University of Tartu will give a keynote presentation on "A perspective on generative neural networks in genomics with applications in synthetic data generation".

February 24, 2025 at 8:58 AM

Laurent Jacob

@laurentjacob.bsky.social

Claudia Solís-Lemus from the University of Wisconsin-Madison will give a keynote presentation on "The good, the bad and the ugly of deep learning in phylogenetic inference".

February 24, 2025 at 8:58 AM

Laurent Jacob

@laurentjacob.bsky.social

Anne-Florence Bitbol from EPFL will give a keynote presentation on "Coevolution-aware language models".

February 24, 2025 at 8:58 AM

Laurent Jacob

@laurentjacob.bsky.social

The next LEGEND conference on machine learning for evolutionary genomics will be in Aussois (French Alps) between December 8th and 12th.

Mark your calendars and make sure your best work is ready next September when the call for abstracts opens 🙂

legend2025.sciencesconf.org

A legendary being holds a phylogenetic tree in the palm of their hand, with snowy mountains in the background.

February 24, 2025 at 8:58 AM

Laurent Jacob

@laurentjacob.bsky.social

All this work was done by Luca Nesterenko and
@lblassel.bsky.social , assisted by P. Veber, Bastien Boussau
and myself.

The code and data are available at github.com/lucanest/Phy...

Please share if you find this interesting, and we welcome your feedback :)

A sketch summarizing the entire Phyloformer process.

June 24, 2024 at 8:35 AM

Laurent Jacob

@laurentjacob.bsky.social

In all these experiments, and regardless of model complexity, Phyloformer run on a GPU was the fastest method.

About two orders of magnitude faster than IQtree, and even twice faster than FastME.

A plot comparing the speed of all methods.

June 24, 2024 at 8:33 AM

Laurent Jacob

@laurentjacob.bsky.social

We then trained Phyloformer under a more realistic model, accounting for co-evolution.

It outperformed all other methods, including IQTree/FastTree, on all metrics.

A plot comparing the error of different methods under a more complex probabilistic model of sequence evolution. Phyloformer outperforms all other methods under all metrics.

June 24, 2024 at 8:32 AM

Laurent Jacob

@laurentjacob.bsky.social

More precisely, Phyloformer was very good at predicting distances, and on the Kuhner-Felsenstein metric accounting for both topology and branch lengths.

Looking at the topology only (Robinson-Foulds metric), it performed less well than IQTree/FastTree, but better than FastME.

A plot stratifying the error into two terms: we perform very well for estimating evolutionary distances, less well for the topology.

June 24, 2024 at 8:32 AM

Laurent Jacob

@laurentjacob.bsky.social

We first trained Phyloformer to perform inference under LG, a common model under which likelihood computation is possible.

It performed much better than FastME (distance method), on par with maximum likelihood approaches (IQTree, FastTree).

A plot comparing the error made by different methods of phylogenetic inference. The distance method FastME underperforms, our method is on par with likelihood methods.

June 24, 2024 at 8:31 AM

Laurent Jacob

@laurentjacob.bsky.social

Phyloformer uses self-attention to progressively share information among and between sequences.

This choice makes our function invariant to the order of the input sequences (any order yields the same output phylogeny).

A visual justification of permutation invariance: two sequence alignments that are identical up to a permutation must lead to the same phylogeny.

June 24, 2024 at 8:30 AM

Laurent Jacob

@laurentjacob.bsky.social

Once trained, Phyloformer provides estimates of all evolutionary distances given the sequences.

But each of these distance estimates is informed by the entire set of sequence, not just the corresponding pair!

We then pass them to FastME, a distance method, to obtain a tree.

The inference process of Phyloformer. We use the train network to estimate evolutionary distances from related sequences, and pass these estimates to FastME to build a phylogeny.

June 24, 2024 at 8:29 AM

Laurent Jacob

@laurentjacob.bsky.social

Phyloformer is a learnable function. Its input is a set of sequences, its output is their phylogeny, represented by evolutionary distances between all pairs of sequences.

We optimize this function on a large number of (phylogeny, sequences) sampled from the probabilistic model.

The process for training Phyloformer. We sample trees and sequences evolved along these trees from the model under which we want to do inference. We use these examples to train a network that predicts the parameters (evolutionary distances, equivalent to the tree) from an observation (aligned related sequences).

June 24, 2024 at 8:28 AM

Laurent Jacob

@laurentjacob.bsky.social

This is where likelihood-free/simulation-based inference comes into play.

Sampling trees and sequences under a probabilistic model is possible under much more complex models, for which likelihood computations would be prohibitive.

It's an alternative way to access the model.

A visual justification of simulation-based and likelihood-free inference: under some probabilistic models, computing likelihoods is hard but sampling data is easy.

June 24, 2024 at 8:27 AM

Laurent Jacob

@laurentjacob.bsky.social

Maximum likelihood approaches on the other hand search for the most likely tree jointly over all sequences.

This makes them accurate but slow. It also restricts these approaches to simplistic models under which likelihood computations are fast enough.

A visual summary of maximum likelihood approaches for phylogenetic inference. We explore the space of phylogenetic trees to find the one making a given set of related sequences as likely as possible under a chosen probabilistic model of sequence evolution.

June 24, 2024 at 8:26 AM

Laurent Jacob

@laurentjacob.bsky.social

Knowing the evolutionary distances (sum of branch lengths) between all pairs of sequences is enough to recover the tree, by hierarchical clustering.

Distance methods rely on this idea, with estimates from pairs of sequences taken separately. This makes them fast but inaccurate.

A visual summary of so-called distance methods for phylogenetic inference. We start from an estimate of evolutionary distances between all pairs of sequences (sums of branch lengths between leaves in the true tree) and build a tree by hierarchical clustering.

June 24, 2024 at 8:25 AM

Laurent Jacob

@laurentjacob.bsky.social

Phylogenetic trees describe how related sequences (at the leaves) evolved from a common ancestor. Internal nodes are successive ancestral sequences.

In probabilistic models, branch lengths represent an expected number of substitutions between the sequences at the two ends.

A diagram of phylogenetic inference: we build a tree summarizing how a given set of related sequences evolved from a common ancestor.

June 24, 2024 at 8:25 AM

Laurent Jacob

@laurentjacob.bsky.social

We just released a preprint for Phyloformer, a likelihood-free inference method for phylogenetic reconstruction: biorxiv.org/content/10.1...

Faster than distance methods like neighbor joining, it outperforms maximum likelihood methods under complex models of sequence evolution.

🧵

June 24, 2024 at 8:24 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news