Chakravarthi Kanduri
chakri.bsky.social
Chakravarthi Kanduri
@chakri.bsky.social
Researcher at the University of Oslo | PhD in #Bioinformatics from University of Helsinki | academic #datascience, #python, #rstats, #ML
12/12🙏 Thanks to all collaborators & co-authors for useful inputs, brainstorming and perspectives: Maria Mamica, Emilie Willoch Olstad, Ingrid Hobæk Haff, @manuelazucknick.bsky.social , Jingyi Jessica Li,
& Geir Kjetil Sandve.
August 18, 2025 at 9:46 PM
11/12 The bottom line: be aware of dependencies in your data! When false findings occur in highly correlated datasets, they can be numerous. Don't let your intuition fool you. Read the full open-access paper here: doi.org/10.1186/s130...
Beware of counter-intuitive levels of false discoveries in datasets with strong intra-correlations - Genome Biology
The false discovery rate (FDR) controlling method by Benjamini and Hochberg (BH) is a popular choice in the omics fields. Here, we demonstrate that in datasets with a large degree of dependencies betw...
doi.org
August 18, 2025 at 9:46 PM
10/12 As a safer alternative, consider the Benjamini-Yekutieli (BY) method when you can tolerate a bit more type II error. It doesn't completely eliminate the issue but makes these large false positive events much less frequent and severe (a good compromise between BH and FWER).
August 18, 2025 at 9:46 PM
9/12 Use negative controls/synthetic null data and other diagnostic checks as recommended in the article to identify and minimize caveats. If continuing to use BH method — try to know its assumptions and formal guarantees to ensure correct interpretation of the findings.
August 18, 2025 at 9:46 PM
8/12 Issues like broken test assumptions, study biases, or the researcher’s flexibility in analyzing the data can make this problem even worse. So, what can you do? We suggest a few key strategies:
August 18, 2025 at 9:46 PM
7/12 This statistical artefact can lead researchers to incorrectly conclude the existence of an underlying biological mechanism, which might even form the main conclusion of their study.
August 18, 2025 at 9:46 PM
6/12 It feels intuitive to believe that if hundreds or thousands of features are flagged as significant, at least some of them must be real. However, we show this intuition can be wrong; it's possible that every single finding is false.
August 18, 2025 at 9:46 PM
5/12 A Counter-Intuitive Trap: Using real-world and simulated data (methylation, gene expression, metabolite and eQTL analyses), we found this phenomenon to be persistent. The primary danger is that researchers may be misled by the sheer volume of these false findings.
August 18, 2025 at 9:46 PM
4/12 This happens because dependencies in the data can cause many features to falsely appear significant together. While the overall FDR is controlled (e.g., <5% of experiments have errors), the experiments that do have errors can have thousands of them.
August 18, 2025 at 9:46 PM
3/12 Even when a study has no true biological signal (all null hypotheses are true), the BH method can occasionally generate thousands of statistically "significant" findings.
August 18, 2025 at 9:46 PM
2/12 The widely used False Discovery Rate (FDR) control method, Benjamini-Hochberg (BH), is a staple in omics research. But when analysing datasets with dependencies between features (like gene expression, methylation, metabolites, QTL analyses ++), it can behave unexpectedly.
August 18, 2025 at 9:46 PM