André Boler Barros
banner
asbarros.bsky.social
André Boler Barros
@asbarros.bsky.social
Data-driven individual, trying to get by in the uncertainty of statistics and life. Avid Rstats follower. Data Analyst at GIMM Institute
"(...) the rise of generative AI in bioinformatics has not diminished my role, but redefined it. It has challenged me to become a better scientist. For good or ill, AI seems to be here to stay. I urge you to embrace the technology — not to replace your expertise, but to amplify it.

#bioinfo
‘Am I redundant?’: how AI changed my career in bioinformatics
A run-in with some artefact-laden AI-generated analyses convinced Lei Zhu that machine learning wasn’t making his role irrelevant, but more important than ever.
www.nature.com
October 28, 2025 at 1:07 PM
Can't advise this lab more. If you'd like to work in a curious-driven and nurturing environment, with a high focus on robust data analysis, don't even think twice!

#bioinfo #datascience
Did we mention that we would like you to do a PhD with us? 😉
📢 EvoMG-DN PhD applications are now open!
Join our EU-funded MSCA network and explore evolution, ageing, and disease.
👉 www.evomg-dn.eu

📅 Apply by 30 November 2025

#PhD #DoctoralTraining #MSCA #HorizonEurope #Genomics #EvolutionaryBiology #BiomedicalResearch#EarlyCareerResearchers
October 7, 2025 at 11:04 AM
Last week, I was fortunate enough to watch a talk from @tkorem.bsky.social , where he presented different things, from addressing inter- study variability on microbiome projects to the use of novel approaches on metagenomics alignment and processing. Interesting and very relevant!

#bioinfo
September 30, 2025 at 12:32 PM
"GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs"

bmcbioinformatics.biomedcentral.com/articles/10....

#datascience #biostats
Glucostats: an efficient Python library for glucose time series feature extraction and visual analysis - BMC Bioinformatics
Background The advancement of technology and continuous glucose monitoring (CGM) systems has introduced several computational and technical challenges for clinicians and researchers. The growing volume of CGM data necessitates the development of efficient computational tools capable of handling and processing this information effectively. This paper introduces GlucoStats, an open-source and multi-processing Python library designed for efficient computation and visualization of a comprehensive set of glucose metrics derived from CGM. It simplifies the traditionally time-consuming and error-prone process of manual CGM metrics calculation, making it a valuable tool for both clinical and research applications. Results Its modular design ensures easy integration into predefined workflows, while its user-friendly interface and extensive documentation make it accessible to a broad audience, including clinicians and researchers. GlucoStats offers several key features: (i) window-based time series analysis, enabling time series division into smaller ‘windows’ for detailed temporal analysis, particularly beneficial for CGM data; (ii) advanced visualization tools, providing intuitive, high-quality visualizations that facilitate pattern recognition, trend analysis, and anomaly detection in CGM data; (iii) parallelization, leveraging parallel computing to efficiently handle large CGM datasets by distributing computations across multiple processors; and (iv) scikit-learn compatibility, adhering to the standardized interface of scikit-learn to allow an easy integration into machine learning pipelines for end-to-end analysis. Conclusions GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs. By offering precise CGM data analysis and user-friendly visualization tools, it serves both technical researchers and non-technical users, such as physicians and patients, with practical and research-driven applications.
bmcbioinformatics.biomedcentral.com
September 29, 2025 at 8:10 AM
"Delphi-2M predicts the rates of more than 1,000 diseases (...), with accuracy comparable to that of existing single-disease models. Delphi-2M (...) also enables sampling of synthetic future health trajectories"

www.nature.com/articles/s41...
Learning the natural history of human disease with generative transformers - Nature
Delphi-2M forecasts a person’s future health, covering more than 1,000 diseases, provides insights into co-morbidity dynamics and generates synthetic data for the training of AI models that have never...
www.nature.com
September 22, 2025 at 4:23 PM
Amazing repository with several references and resources for scRNASeq analysis

github.com/crazyhottomm...

#bioinfo #singlecell
September 11, 2025 at 1:54 PM
"Even if work is done by or with the help of experts (...), it is crucial that researchers understand how a method works, that they can assess data quality, and that they fundamentally understand what types of conclusions can and cannot be drawn from their data"

www.nature.com/articles/s41...
Push-button science - Nature Methods
Technological advances change not only what we can learn as scientists, but also how science is conducted. Here we explore how automation and outsourcing are affecting the act of doing science.
www.nature.com
September 10, 2025 at 11:48 PM
Spreadsheets represent an everyday tool for most wet-lab scientists. So, why not use them at their highest potential, efficiently and ready for open science?

This paper provides some recommendations for the use of spreadsheets:

www.nature.com/articles/d41...

#bioinfo #stats
Six questions to ask before jumping into a spreadsheet
Spreadsheet software can be frustrating, but adopting some helpful habits can improve its effectiveness.
www.nature.com
August 21, 2025 at 8:29 AM
Reposted by André Boler Barros
FlyBase, a Drosophila database, will lose a third of its team in early October because the Harvard grant that covered the employees’ salaries was canceled. Scientists warn that losing FlyBase could devastate fly research.

By @claudia-lopez.bsky.social

www.thetransmitter.org/community/ha...
Harvard University lays off fly database team
The layoffs jeopardize this resource, which has served more than 4,000 labs for about three decades.
www.thetransmitter.org
August 13, 2025 at 7:32 PM
1/n

Because, in bioinformatics, sharing is caring, let me share something I have recently started exploring - graph mapping and pangenome graphs.

A pangenome graph encodes a reference genome built from many genomes in one structure, thus trying to encapsulate the known genetic variability.
August 12, 2025 at 9:04 AM
1/n
Brief guide to statistical analysis of grouped data in preclinical research
www.nature.com/articles/s42...

In preclinical studies, clustering and nesting (C&N) scenarios, such as group-housed animals or cells on a single plate, are frequently found. This has important statistical implications
A brief guide to statistical analysis of grouped data in preclinical research - Nature Metabolism
Clustering and nesting (C&N) arise in many preclinical studies, such as when animals are group-housed or share litters, or in cell culture. Ignoring C&N undermines the validity of analyses. He...
www.nature.com
June 26, 2025 at 8:11 AM
Reposted by André Boler Barros
Instead of painstakingly dissecting a set of primary data to find novel patterns, it can be more effective to fit an unsupervised cluster-factor-latent-spaces model and then painstakingly dissect the model parameters to find patterns imposed by the model inference.
June 24, 2025 at 12:23 PM
Very interesting paper from Soares lab @gimmfoundation.bsky.social
A bioenergetic basis for multiorgan dysfunction in sepsis https://www.biorxiv.org/content/10.1101/2025.06.12.659280v1
June 17, 2025 at 9:03 PM
Reposting this a single time feels too short to show how much I agree with this
1/"I have a presentation tomorrow."
If you've ever collaborated with wet lab scientists as a bioinformatician…
you’ve heard this. And died inside a little.
May 28, 2025 at 1:45 PM
'One possible strategy is to organize consortium-led initiatives that integrate and curate reference datasets, establish criteria for dataset transparency, and mandate standardized preprocessing pipelines to reduce the variability in how competing models are evaluated'
www.nature.com/articles/s41...
A benchmarking crisis in biomedical machine learning - Nature Medicine
A lack of standardized benchmarks is hindering progress and patient benefits
www.nature.com
April 9, 2025 at 3:44 PM
Reposted by André Boler Barros
Quando se corta na ciência
When science is cut

Instamos os políticos a reconhecerem que a ciência não é um interruptor que pode ser ligado e desligado sem consequências profundas e duradouras.

www.publico.pt/2025/...
1/2
Quando se corta na ciência
Instamos os políticos a reconhecerem que a ciência não é um interruptor que pode ser ligado e desligado sem consequências profundas e duradouras.
www.publico.pt
March 31, 2025 at 9:57 AM
Reposted by André Boler Barros
Taking the opportunity to repost this since people hate it BECAUSE I'M RIGHT.
January 21, 2025 at 9:03 AM
Reposted by André Boler Barros
GIMM is on Bluesky! We're a recent research foundation created by the merger of 2 leading research institutes in Portugal, IGC and iMM, dedicated to answering fundamental questions of biology and human health & developing solutions to improve health and promote local and global equity. Visit gimm.pt
January 16, 2025 at 5:53 PM
Reposted by André Boler Barros
Bluesky is indeed a social network on the rise. Another project with amazing potential is the new Portuguese institute - Gulbenkian Institute of Molecular Medicine (GIMM) - gimm.pt.

I have created a Starter's pack for the GIMM Team, to link with each other and with the world

go.bsky.app/RACwsRN
December 10, 2024 at 10:05 AM
Glad to see my institute @gimmfoundation.bsky.social on bluesky!

Check up the account, and definitely follow them! Good things will be popping up frequently
January 17, 2025 at 3:33 PM
Reposted by André Boler Barros
Hi folks! I'm looking for a good and easy to use pipeline to annotate genomes from RNA-seq & protein data. Any suggestions?
(I've already tried BRAKER3, I was wondering if people were using other tools out there...)
🧪
December 12, 2024 at 10:37 AM
Bluesky is indeed a social network on the rise. Another project with amazing potential is the new Portuguese institute - Gulbenkian Institute of Molecular Medicine (GIMM) - gimm.pt.

I have created a Starter's pack for the GIMM Team, to link with each other and with the world

go.bsky.app/RACwsRN
December 10, 2024 at 10:05 AM
Reposted by André Boler Barros
Working on a case study for survival analysis. These models are odd compared to typical GLM examples: missing data (censoring) that cannot be ignored, data model (likelihood) and data-generating model not same, how to visualize predictions, 2+ equivalent ways to program. Good real data wrestling.
December 2, 2024 at 12:06 PM