Lightnews — Scholar-powered news

Stephan Köstlbacher

@stephkoe.bsky.social

480 followers 310 following 34 posts

Former PostDoc at Wageningen University with Thijs Ettema
Studying ancient evolutionary transitions in prokaryotes using phylogenomics and structural modeling
Looking for next step in academic or translational research
he/him

Posts Replies Media Videos

Stephan Köstlbacher

@stephkoe.bsky.social

🧙‍♀️ Something is brewing in the WitChi cauldron…
After some excellent peer review feedback, a new update of WitChi is taking shape, refining how we detect and prune compositional bias in phylogenomic alignments.

Stay tuned for the next release!

🧙‍♀️ Can’t model it? Prune it!
github.com/stephkoest/w...

October 31, 2025 at 9:59 AM

Stephan Köstlbacher

@stephkoe.bsky.social

8.
GTDB r220 case study (led by @kassipan.bsky.social )
Applied WitChi to the archaeal GTDB r220 supermatrix:
• 5,869 taxa
• 55% of columns pruned
• Biased taxa: 95.1% → 2.3%
• Runtime: <2h on 4 cores
→ Known clades recovered — without using very complex C60 or CAT models

July 20, 2025 at 1:04 PM

Stephan Köstlbacher

@stephkoe.bsky.social

7.
Use witchi test to quantify bias per taxon:
• χ² scores
• Empirical p-values (via permutations)
• Z-scores to see how far taxa deviate from expectation
→ Great for screening MSAs or comparing compositional distortion across datasets.

July 20, 2025 at 1:04 PM

Stephan Köstlbacher

@stephkoe.bsky.social

6.
WitChi solves both problems:
🔹 Builds a null distribution using column permutations — no model, no tree
🔹 Recursively removes columns that distort the taxon-wise χ² profile
🎁 Bonus: 3 scoring strategies, including one capturing distribution-wide effects (Wasserstein)
⚡ Scales linearly with taxa

July 20, 2025 at 1:04 PM

Stephan Köstlbacher

@stephkoe.bsky.social

5.
Classical χ² pruning trims biased columns once — fast, but naive.
→ As alignment composition shifts, Δχ² must be updated — few tools do this.
BMGE’s stationary-based algorithm prunes iteratively and works well, but scales quadratically with taxa — not feasible for medium sized or large datasets.

July 20, 2025 at 1:04 PM

Stephan Köstlbacher

@stephkoe.bsky.social

4.
The problem:
χ² assumes taxa are independent and identically distributed samples.

In MSAs, they share history → correlated data.
So parametric χ² nulls are invalid.
Simulations help, but they need known models and trees — which bias distorts.
→ Slow, circular, rarely used.

July 20, 2025 at 1:04 PM

Stephan Köstlbacher

@stephkoe.bsky.social

2.
What’s compositional bias?
When unrelated taxa convergently evolve similar sequence compositions (e.g. GC-rich, AT-rich), tree algorithms may group them by chemistry, not ancestry — a well-known artefact in deep phylogenies.
Fig modified from: doi.org/10.1007/978-...

July 20, 2025 at 1:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news