Lightnews — Scholar-powered news

Alessio Capobianco

@acapomorphic.bsky.social

Ah, I see! That is definitely true, but that same character discretized vs in its continuous form carries a very different quantity of information. If modeled properly (maybe a big if), one continuous character should have more information content to estimate a phylogeny than its discretized version

October 16, 2025 at 1:56 PM

Alessio Capobianco

@acapomorphic.bsky.social

I like the optimism there! I guess what would be good to know is: given the amount of "perturbation" from the true phylogeny that I can expect based on the size of my data, are the main patterns I'm interested in (diversification, biogeography, phenotypic evolution) robust to that much perturbation?

October 16, 2025 at 1:48 PM

Alessio Capobianco

@acapomorphic.bsky.social

Continuous characters are not that commonly used in Bayesian morphological phylogenetics though. I want to believe that those can be a mostly unexplored source of information to infer evolutionary relationships, although I'm very aware that they come with their own set of issues and limitations

October 16, 2025 at 10:16 AM

Alessio Capobianco

@acapomorphic.bsky.social

In my simulation, a phylogeny inferred for 50 taxa with 50 binary characters on average has 50% of the nodes wrong (and this is with no model misspecification and no missing data). What can we do about it? I don't have any clear/easy solution, but at the same time I don't want to be too pessimistic

October 16, 2025 at 10:11 AM

Alessio Capobianco

@acapomorphic.bsky.social

I totally agree, for some systems there is possibly an intrinsic limit on the number of (more or less) independent variable characters that can be defined and scored that is lower than 100. Then the question is: what can we do for those? Do we just accept that our phylo estimates will always be off?

October 16, 2025 at 10:01 AM

Alessio Capobianco

@acapomorphic.bsky.social

Thank you! I would expect that adding rate heterogeneity to the model (which means adding parameters) would require at least the same minimum number of characters, if not more.

October 16, 2025 at 9:56 AM

Alessio Capobianco

@acapomorphic.bsky.social

Problem: a lot of empirical morphological datasets have fewer than 100 characters, and way fewer than 500. Possible solutions? Continuous characters; total-evidence datasets; more funding, hiring, training targeted towards characterization and digitization of interspecific morphological diversity.

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

An important point: the 100-500 chars threshold refers to an ideal scenario where we know under which model the data evolved (no model misspecification), this model is relatively simple (few parameters to infer), and there is no missing data. Thus, this should be taken as a very minimum number.

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

One intriguing empirical application of these findings is that, for more than 50 taxa, characters that change multiple times independently across the tree (homoplastic characters) improve tree reconstruction compared to characters that change only once (synapomorphies and autapomorphies).

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

Overall, between 100 and 500 variable characters are necessary to reach sufficient accuracy and precision of phylogenetic estimates for as low as 20 taxa. This is relevant not only for morphological phylogenetics, but also for gene trees and SNP-based estimates, and for Bayesian phylolinguistics.

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

Three different metrics of accuracy and precision were used to evaluate how good was the phylogenetic inference. General resulting patterns: more characters are better; more states are better (but this has little effect for >50 taxa); more taxa are worse for short trees, but better for long trees.

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

I designed this simulation study in RevBayes to perfectly match the models used for sim and inference. Any differences between the true tree generating data and the inferred tree(s) are due to dataset size. Parameters that varied across sims are: # characters, # taxa, tree length, and # states.

October 15, 2025 at 9:29 AM

Alessio Capobianco

@acapomorphic.bsky.social

I'm afraid we forgot to mention any lemurs there... 😅 But I hope you're still using them for Analytical Paleo! 😁

September 26, 2025 at 8:42 AM

Alessio Capobianco

@acapomorphic.bsky.social

Thanks Jeff! The VP lecture slides definitely left an impression, I had to use the bird somewhere 😅

September 25, 2025 at 9:29 PM

Alessio Capobianco

@acapomorphic.bsky.social

We hope that our contribution will not only be a useful reference for all researchers wanting to perform a tip-dating analysis on their favorite group of organisms, but also a starting point of discussion to further improve this class of methods and its application to empirical data!

September 25, 2025 at 12:40 PM

Alessio Capobianco

@acapomorphic.bsky.social

Non-exhaustive list of things you can find in our paper:
- A survey of all fossil tip-dating studies published until 2023
- A flowchart with all the steps to set up a tip-dating analysis
- Detailed discussion of all the elements making up a tip-dating analysis, from molecular alignment to FBD models

September 25, 2025 at 12:40 PM

Alessio Capobianco

@acapomorphic.bsky.social

Here is a Nature News writeup on our work, covering also another remarkable new paper on extremely old enamel proteins by Daniel Green and colleagues. www.nature.com/articles/d41...

Ancient proteins rewrite the rhino family tree — are dinosaurs next?

Molecules from 20-million-year-old teeth are among the oldest ever sequenced.

www.nature.com

July 10, 2025 at 8:38 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news