Pitágoras Alves
banner
pitagoras-alves.bsky.social
Pitágoras Alves
@pitagoras-alves.bsky.social
Data Science specialist - Ministry of Management and Innovation (Brazil)
PhD student doing research on ncRNAs and protein function prediction (UFRN)
Reposted by Pitágoras Alves
Umberto Eco, atualíssimo!
December 1, 2025 at 8:55 PM
Me, joining the competition 2 months late hahahaha
December 2, 2025 at 3:01 PM
Tem sido muito legal trabalhar nas primeiras fases desse projeto
O projeto Hermes é uma iniciativa para agilizar e melhorar a qualidade dos atendimentos de ligações de emergência no Brasil inteiro. Queremos utilizar agentes de IA e um projeto renovado de UI/UX para que os operadores possam
Parceria entre MJSP e MGI vai acelerar entrega de nova versão do Celular Seguro e do Projeto Hermes
Programa do MGI, Startup.Gov tem o objetivo de apoiar e dar agilidade a projetos estratégicos de transformação digital do Governo Federal
www.gov.br
August 12, 2025 at 3:06 AM
Lately I've been feeling like my PHD project is making me more and more socially isolated
I wonder if other people have trouble staying connected to friends and family while working on their thesis
July 27, 2025 at 9:55 PM
finally! after so much time I was finally able to convert all my old TensorFlow code to Pytorch
Not only it looks better now, but also uses a little less memory and runs faster than before
July 27, 2025 at 9:51 PM
Deu certo, agora é a minha terceira semana como especialista em ciência de dados no MGI
O projeto no qual estou trabalhando ainda não foi divulgado, mas posso dizer que é algo incrível que um dia pode ajudar a salvar muitas vidas. Até recuperei o animo sobre trabalhar como cientista rsrsrsrs
Finalmente anexaram meu contrato, parece que está tudo dando certo \o/
May 14, 2025 at 10:23 PM
Fica a dica:
Caso algum dia vocês decidam participar de um concurso que vai cobrar Lei Geral de Proteção de Dados, façam esse mini-curso:
www.escolavirtual.gov.br/curso/290
Escola Virtual Gov
Escola Virtual Gov
www.escolavirtual.gov.br
May 14, 2025 at 10:17 PM
You can have the most intricate RAG pipeline and use the largest models available
And most of the times, the result is still going to be garbage
To me, it feels like the technology industry is going crazy
Current LLMs are just not enough
Why is everyone pretending they are perfect?
May 4, 2025 at 7:47 AM
Reading about coding with Cursor+Tab and I feel like Anakin being tempted by the darkside
a close up of a man with the word janobot on the bottom right
ALT: a close up of a man with the word janobot on the bottom right
media.tenor.com
April 10, 2025 at 2:08 PM
Status: dando F5 no meu processo de contratação no site do governo a cada 30min
April 1, 2025 at 11:14 PM
Um bando de estadunidense idiota que não sabe o que é cinema
March 3, 2025 at 3:46 AM
Reposted by Pitágoras Alves
My favourite figure from this paper: unlike coding variants, rare non-coding variants are almost equally likely to increase or decrease circulating protein levels. Thanks to @ukbiobank.bsky.social for genomic and proteomic data!
February 24, 2025 at 7:30 PM
Since this database was completed, I've been working on benchmarking the most used Protein Language Models on the task of predicting molecular functions. These are the some of first results
In the first figure, the models were optimized for F1 Score and in the second for ROC AUC Score
February 25, 2025 at 1:28 PM
Can someone please explain why Meta is incapable of finding good designers while having billions of dollars for their projects
This is so UGLY
This is Meta’s brand new ad for Horizon Worlds. Not sure how I’m supposed to feel about it.
February 17, 2025 at 4:27 PM
I'm sampling HUGE datasets in a 8GB laptop with no problems, Polaris+parquet is really wonderful
Goodbye pandas 👋🏻
February 17, 2025 at 4:24 PM
A despedida mais triste do mundo: eu tenho home office na segunda, mas ela tem que ir cedo pro escritório 😭
February 10, 2025 at 11:02 AM
Se puderem, parem um pouquinho para pesquisar sobre o mal que essa agencia faz e sempre fez a economias subdesenvolvidas
Até o relógio quebrado acerta a hora duas vezes no dia
February 3, 2025 at 1:40 PM
Precisamos voltar a conversar seriamente sobre o poder disruptivo do open source para acabar com monopólios
So, is the battle over? Did open-source DeepSeek AI/LLM win the race? DeepSeek: Chinese AI chatbot sparks market turmoil for rivals such as Nvidia, Microsoft, Meta, and others
January 27, 2025 at 5:21 PM
This collection of PLM datasets is finally complete!
After several weeks of processing, we now have Uniprot/SwissProt embeddings made with the second largest ESM model, the ESM2_T36 3 Billion parameters model
The files are very large, but the parquet format enables loading just a portion of the them
January 16, 2025 at 5:15 PM
That's an amazing development! Good RNA LMs have a huge potential to finally uncover the biological roles of ncRNAs
LncRNA-BERT: An RNA Language Model for classifying Coding and Long Non-Coding RNA [new]
RNA coding/non-coding clf. w/ pre-trained lang. model. Novel seq. enc. outperf. existing. Pretraining benefits.
January 14, 2025 at 5:11 PM
I've been using gzip-compressed numpy up until now (despite not being memory-efficient) because dataset sizes are very small... I expected polars/parquet to be a little larger, but I'm seeing a 1~3% reduction in file size. So it's less memory AND storage usage.
January 14, 2025 at 4:59 PM
Updates on the Protein Dimension DB!
First, the database now has ankh-large and ankh-base embeddings. They probably are the most comprehensive ankh datasets currently available
Secondly, following recommendations by the community, I will be saving models in a more efficient format (probabily polars)
January 13, 2025 at 7:38 PM
I was considering including ESM Cambrian (ESMC) in my projects... but, differently from ESM2 (which was released under MIT license), ESMC has a lot of limitations. I dont understand the fine details, but it seems only the weaker model can be used for any purpose
January 10, 2025 at 4:46 PM
I decided to make the datasets I'm generating for my phd project public. For each protein in Swiss-Prot, I'm making available PLM embeddings (ProtTrans, Ankh, ESM2), GO annotations and taxonomy representations. All files follow the same order, one line per protein.

github.com/pentalpha/pr...
GitHub - pentalpha/protein_dimension_db: Datasets with embeddings and other representations for all proteins in Uniprot/Swiss-Prot
Datasets with embeddings and other representations for all proteins in Uniprot/Swiss-Prot - pentalpha/protein_dimension_db
github.com
December 27, 2024 at 3:51 PM
Os devs de um certo projeto de PLM no github, respondendo alguém que perguntou pq eles ñ disponibilizaram os resutlados do modelo: "As our models are highly optimized, users could use them quickly to extract the embedding for their own use cases."
Eu numa maquina de 40 threads:68h para terminar isso
December 27, 2024 at 3:24 PM