Stephan Köstlbacher
banner
stephkoe.bsky.social
Stephan Köstlbacher
@stephkoe.bsky.social
Former PostDoc at Wageningen University with Thijs Ettema
Studying ancient evolutionary transitions in prokaryotes using phylogenomics and structural modeling
Looking for next step in academic or translational research
he/him
Congratttttttts!!!! So well deserved :)
October 23, 2025 at 1:14 PM
If you want to mess around with some motifs, check out:
github.com/stephkoest/E...
GitHub - stephkoest/Ecoli_titration
Contribute to stephkoest/Ecoli_titration development by creating an account on GitHub.
github.com
August 22, 2025 at 9:16 AM
It was a pleasure to work with you! 😊
July 28, 2025 at 9:17 AM
Hey Felix, good question! Yeah it is different. In short: trimAl/ClipKit aim to remove uninformative sites. WitChi is a second step to remove misleading sites: those that can group unrelated taxa just because their sequence composition looks similar. Hope that helps!
July 21, 2025 at 4:11 PM
Thanks, jolien! ;)
July 21, 2025 at 6:26 AM
And of course the great work!
July 20, 2025 at 5:02 PM
Thanks for that beautiful summary, kassi :)
July 20, 2025 at 5:01 PM
It was a fun project :) Thanks for the support!
July 20, 2025 at 1:23 PM
9. TL;DR + link dump
WitChi is:
✔ Fast
✔ Interpretable
✔ Tree- and model-free
✔ Benchmark-validated
Designed to fix compositional bias at phylogenomic scale.
With: @kassipan.bsky.social, @danieltamarit.bsky.social, @ettema.bsky.social
💻 github.com/stephkoest/w...
📄 www.biorxiv.org/content/10.1...
GitHub - stephkoest/witchi: A compositional bias testing and pruning tool for multiple sequence alignments
A compositional bias testing and pruning tool for multiple sequence alignments - stephkoest/witchi
github.com
July 20, 2025 at 1:04 PM
8.
GTDB r220 case study (led by @kassipan.bsky.social )
Applied WitChi to the archaeal GTDB r220 supermatrix:
• 5,869 taxa
• 55% of columns pruned
• Biased taxa: 95.1% → 2.3%
• Runtime: <2h on 4 cores
→ Known clades recovered — without using very complex C60 or CAT models
July 20, 2025 at 1:04 PM
7.
Use witchi test to quantify bias per taxon:
• χ² scores
• Empirical p-values (via permutations)
• Z-scores to see how far taxa deviate from expectation
→ Great for screening MSAs or comparing compositional distortion across datasets.
July 20, 2025 at 1:04 PM
6.
WitChi solves both problems:
🔹 Builds a null distribution using column permutations — no model, no tree
🔹 Recursively removes columns that distort the taxon-wise χ² profile
🎁 Bonus: 3 scoring strategies, including one capturing distribution-wide effects (Wasserstein)
⚡ Scales linearly with taxa
July 20, 2025 at 1:04 PM
5.
Classical χ² pruning trims biased columns once — fast, but naive.
→ As alignment composition shifts, Δχ² must be updated — few tools do this.
BMGE’s stationary-based algorithm prunes iteratively and works well, but scales quadratically with taxa — not feasible for medium sized or large datasets.
July 20, 2025 at 1:04 PM
4.
The problem:
χ² assumes taxa are independent and identically distributed samples.

In MSAs, they share history → correlated data.
So parametric χ² nulls are invalid.
Simulations help, but they need known models and trees — which bias distorts.
→ Slow, circular, rarely used.
July 20, 2025 at 1:04 PM