Stephan Köstlbacher
banner
stephkoe.bsky.social
Stephan Köstlbacher
@stephkoe.bsky.social
Former PostDoc at Wageningen University with Thijs Ettema
Studying ancient evolutionary transitions in prokaryotes using phylogenomics and structural modeling
Looking for next step in academic or translational research
he/him
🧙‍♀️ Something is brewing in the WitChi cauldron…
After some excellent peer review feedback, a new update of WitChi is taking shape, refining how we detect and prune compositional bias in phylogenomic alignments.

Stay tuned for the next release!

🧙‍♀️ Can’t model it? Prune it!
github.com/stephkoest/w...
October 31, 2025 at 9:59 AM
8.
GTDB r220 case study (led by @kassipan.bsky.social )
Applied WitChi to the archaeal GTDB r220 supermatrix:
• 5,869 taxa
• 55% of columns pruned
• Biased taxa: 95.1% → 2.3%
• Runtime: <2h on 4 cores
→ Known clades recovered — without using very complex C60 or CAT models
July 20, 2025 at 1:04 PM
7.
Use witchi test to quantify bias per taxon:
• χ² scores
• Empirical p-values (via permutations)
• Z-scores to see how far taxa deviate from expectation
→ Great for screening MSAs or comparing compositional distortion across datasets.
July 20, 2025 at 1:04 PM
6.
WitChi solves both problems:
🔹 Builds a null distribution using column permutations — no model, no tree
🔹 Recursively removes columns that distort the taxon-wise χ² profile
🎁 Bonus: 3 scoring strategies, including one capturing distribution-wide effects (Wasserstein)
⚡ Scales linearly with taxa
July 20, 2025 at 1:04 PM
5.
Classical χ² pruning trims biased columns once — fast, but naive.
→ As alignment composition shifts, Δχ² must be updated — few tools do this.
BMGE’s stationary-based algorithm prunes iteratively and works well, but scales quadratically with taxa — not feasible for medium sized or large datasets.
July 20, 2025 at 1:04 PM
4.
The problem:
χ² assumes taxa are independent and identically distributed samples.

In MSAs, they share history → correlated data.
So parametric χ² nulls are invalid.
Simulations help, but they need known models and trees — which bias distorts.
→ Slow, circular, rarely used.
July 20, 2025 at 1:04 PM
2.
What’s compositional bias?
When unrelated taxa convergently evolve similar sequence compositions (e.g. GC-rich, AT-rich), tree algorithms may group them by chemistry, not ancestry — a well-known artefact in deep phylogenies.
Fig modified from: doi.org/10.1007/978-...
July 20, 2025 at 1:04 PM