Lightnews — Scholar-powered news

Chakravarthi Kanduri

@chakri.bsky.social

Researcher at the University of Oslo | PhD in #Bioinformatics from University of Helsinki | academic #datascience, #python, #rstats, #ML

Posts Replies Media Videos

Reposted by Chakravarthi Kanduri

Gur Yaari

@guryaari.bsky.social

Ready to make your mark?
Accept the challenge 👇 🔗: kaggle.com/competitions...
#AIRR #Competition #DeepLearning #ComputationalBiology
@victorgreiff.bsky.social @chakri.bsky.social

AIRR-ML-25: Adaptive Immune Profiling Challenge

Predict labels (e.g. disease, healthy) from sets of immune receptor sequences, and identify the sequences that explain the labels.

kaggle.com

November 14, 2025 at 7:11 PM

Chakravarthi Kanduri

@chakri.bsky.social

12/12🙏 Thanks to all collaborators & co-authors for useful inputs, brainstorming and perspectives: Maria Mamica, Emilie Willoch Olstad, Ingrid Hobæk Haff, @manuelazucknick.bsky.social , Jingyi Jessica Li,
& Geir Kjetil Sandve.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

11/12 The bottom line: be aware of dependencies in your data! When false findings occur in highly correlated datasets, they can be numerous. Don't let your intuition fool you. Read the full open-access paper here: doi.org/10.1186/s130...

Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations - Genome Biology

The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies betw...

doi.org

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

10/12 As a safer alternative, consider the Benjamini-Yekutieli (BY) method when you can tolerate a bit more type II error. It doesn't completely eliminate the issue but makes these large false positive events much less frequent and severe (a good compromise between BH and FWER).

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

9/12 Use negative controls/synthetic null data and other diagnostic checks as recommended in the article to identify and minimize caveats. If continuing to use BH method — try to know its assumptions and formal guarantees to ensure correct interpretation of the findings.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

8/12 Issues like broken test assumptions, study biases, or the researcher’s flexibility in analyzing the data can make this problem even worse. So, what can you do? We suggest a few key strategies:

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

7/12 This statistical artefact can lead researchers to incorrectly conclude the existence of an underlying biological mechanism, which might even form the main conclusion of their study.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

6/12 It feels intuitive to believe that if hundreds or thousands of features are flagged as significant, at least some of them must be real. However, we show this intuition can be wrong; it's possible that every single finding is false.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

5/12 A Counter-Intuitive Trap: Using real-world and simulated data (methylation, gene expression, metabolite and eQTL analyses), we found this phenomenon to be persistent. The primary danger is that researchers may be misled by the sheer volume of these false findings.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

4/12 This happens because dependencies in the data can cause many features to falsely appear significant together. While the overall FDR is controlled (e.g., <5% of experiments have errors), the experiments that do have errors can have thousands of them.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

3/12 Even when a study has no true biological signal (all null hypotheses are true), the BH method can occasionally generate thousands of statistically "significant" findings.

August 18, 2025 at 9:46 PM

Chakravarthi Kanduri

@chakri.bsky.social

2/12 The widely used False Discovery Rate (FDR) control method, Benjamini-Hochberg (BH), is a staple in omics research. But when analysing datasets with dependencies between features (like gene expression, methylation, metabolites, QTL analyses ++), it can behave unexpectedly.

August 18, 2025 at 9:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news