Lightnews — Scholar-powered news

Leonardo Cotta

@cottascience.bsky.social

it's never been more fun to code, because it's never been more valuable to care about your code, to optimize the details and to write beautiful and compact code. this is the valuable type of code now. it's not just the age of research, it's the age of computer science.

December 30, 2025 at 11:29 AM

Leonardo Cotta

@cottascience.bsky.social

I can only imagine how crazy it must be to be a PhD student submitting to ML conferences now. The process has always been noisy, but at this point it's selecting for either obfuscation or shallow ideas. You either intimidate the reviewer, or you write a blog post in latex.

November 16, 2025 at 7:34 PM

Reposted by Leonardo Cotta

The Matter Lab

@thematterlab.bsky.social

We're excited to present our latest article in Nature Machine Intelligence - Boosting the predictive power of protein representations with a corpus of text annotations.

Link: www.nature.com/articles/s42...
[1/4]

August 21, 2025 at 7:34 PM

Leonardo Cotta

@cottascience.bsky.social

the goat of brazilian music w/ the best of (current) american music
www.youtube.com/watch?v=jFUh...

Milton Nascimento & esperanza spalding: Tiny Desk (Home) Concert

YouTube video by NPR Music

www.youtube.com

August 9, 2025 at 3:06 PM

Leonardo Cotta

@cottascience.bsky.social

I loved this new preprint by Lourie/Hu/ @kyunghyuncho.bsky.social . If you really wanna convince someone youre training a foundation model, or proposing better methodology, loss scaling laws aren't enough. It has to be tied w/ downstream performance. it shouldn't be vibes
arxiv.org/abs/2507.00885

Scaling Laws Are Unreliable for Downstream Tasks: A Reality Check

Downstream scaling laws aim to predict task performance at larger scales from pretraining losses at smaller scales. Whether this prediction should be possible is unclear: some works demonstrate that t...

arxiv.org

July 26, 2025 at 10:43 PM

Leonardo Cotta

@cottascience.bsky.social

I'm very excited about our new work: SciGym. How can we scale scientific agents' evaluation?
TLDR; Systems biologists have spent decades encoding biochemical networks (metabolic pathways, gene regulation, etc.) into machine-runnable systems. We can use these as "dry labs" to test AI agents!

July 16, 2025 at 8:17 PM

Leonardo Cotta

@cottascience.bsky.social

I wish we had an ML equivalent of SOSA (Symposium On Simplicity in Algorithms). "simpler algorithms manifest a better understanding of the problem at hand; they are more likely to be implemented and trusted by practitioners; they are more easily taught" www.siam.org/conferences-....

June 29, 2025 at 5:04 PM

Reposted by Leonardo Cotta

Quaid Morris

@quaidmorris.bsky.social

Please check out our new approach to modeling somatic mutation signatures.

DAMUTA has independent Damage and Misrepair signatures whose activities are more interpretable and more predictive of DNA repair defects, than COSMIC SBS signatures 🧬🖥️🧪

www.biorxiv.org/content/10.1...

Damage and Misrepair Signatures: Compact Representations of Pan-cancer Mutational Processes

Mutational signatures of single-base substitutions (SBSs) characterize somatic mutation processes which contribute to cancer development and progression. However, current mutational signatures do not ...

www.biorxiv.org

June 3, 2025 at 12:34 AM

Leonardo Cotta

@cottascience.bsky.social

I haven't been up to date with the model collapse literature, but it's crazy the amount of papers that consider the case where people only reuse data from the model distribution. This never happens, there's always some human curation or conditioning that yields some type of "real-world, new, data".

April 13, 2025 at 6:26 PM

Leonardo Cotta

@cottascience.bsky.social

This is my favourite "graph paper" of the last 1 or 2 years. We also need to start including non-NN baselines, e.g. fingerprints+catboost ---if the goal is real-world impact and not getting it published asap. I also recommend following @wpwalters.bsky.social's blog.
arxiv.org/abs/2502.14546

Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

While machine learning on graphs has demonstrated promise in drug design and molecular property prediction, significant benchmarking challenges hinder its further progress and relevance. Current bench...

arxiv.org

March 24, 2025 at 5:21 PM

Reposted by Leonardo Cotta

Derek Thompson

@dkthomp.bsky.social

Unbelievable news.

Pancreatic is one of the deadliest cancers.

New paper shows personalized mRNA vaccines can induce durable T cells that attack pancreatic cancer, with 75% of patients cancer free at three years—far, far better than standard of care.

www.nature.com/articles/s41...

February 27, 2025 at 5:03 PM

Reposted by Leonardo Cotta

Thomas Wolf

@thomwolf.bsky.social

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot...

A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

The Ultra-Scale Playbook - a Hugging Face Space by nanotron

The ultimate guide to training LLM on large GPU Clusters

hf.co

February 19, 2025 at 6:10 PM

Leonardo Cotta

@cottascience.bsky.social

I've always hated the "reasoning models" for code assistance since I think the most useful application of LLMs is really writing the boring helper functions and letting us focus on the hard work. However, I found o3 to be particularly useful when debugging ML code, e.g., 1/2

February 19, 2025 at 3:02 PM

Leonardo Cotta

@cottascience.bsky.social

The whole DeepSeek-R1 thing just highlights computer science's main feature: you can do A LOT with a small team and some (limited) resources. This is how we've been able to scale innovation and why free software is important.

January 25, 2025 at 4:51 PM

Leonardo Cotta

@cottascience.bsky.social

This is an amazing resource (of resources) for machine learners

Pat Walters @wpwalters.bsky.social · Jan 23

Machine Learning in Drug Discovery Resources page updated for 2025. github.com/PatWalters/r...

GitHub - PatWalters/resources_2025: Machine Learning in Drug Discovery Resources 2024

Machine Learning in Drug Discovery Resources 2024. Contribute to PatWalters/resources_2025 development by creating an account on GitHub.

github.com

January 24, 2025 at 2:14 PM

Reposted by Leonardo Cotta

Sara Magliacane

@smaglia.bsky.social

Sad after #AISTATS2025 and #ICLR2025 notifications? As we say in Italy, when a door closes, a bigger one opens ;)

If you have a fantastic paper on #uncertainty #AI #ML #causality #statML #probabilisticmodels #reasoning #impreciseprobabilities etc, consider submitting to #UAI2025 🇧🇷 deadline 10 Feb 💥

uai2026 @auai.org · Dec 3

The 41st Conference on #Uncertainty in #AI will be held in Rio de Janeiro 🇧🇷, July 21-25!

The CfP is out 👉 www.auai.org/uai2025/call...

🚨 Feb 10: Paper submission
🗣️ Apr 3-10: rebuttal period
🎉/💀 May 6: Author notification

#UAI2025 #ML #stats #learning #reasoning #uncertainty

January 23, 2025 at 8:44 PM

Reposted by Leonardo Cotta

Bruno Ribeiro (at #NeurIPS2024)

@brunofmr.bsky.social

Slides of my presentation "Mathematical Foundations of Graph Foundation Models" yesterday at the AMS Session of the #JMM2025. The accompanying paper is coming soon.
www.cs.purdue.edu/homes/ribeir...

January 9, 2025 at 10:39 PM

Leonardo Cotta

@cottascience.bsky.social

Learning Rust ~properly~ during my break and wow -- absolutely worth it! While we're all chasing GPU optimization, there's something magical about crafting efficient CPU-based apps. Clean and fast data processing can change our lives ;)

December 30, 2024 at 6:32 PM

Reposted by Leonardo Cotta

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

My model is that these things are extremely helpful above some skill bar and extremely harmful below some skill bar

December 23, 2024 at 1:53 AM

Leonardo Cotta

@cottascience.bsky.social

very cool observations about using smiles/graphs vs fingerprints. TDLR; fingerprints only capture certain properties marginally, and their combinations can often give rise to something new/different.
www.deepmedchem.com/articles/wha...

What Can Neural Network Embeddings Do That Fingerprints Can’t?

www.deepmedchem.com

December 22, 2024 at 12:23 AM

Reposted by Leonardo Cotta

Nikita Dhawan

@nikitadhawan.bsky.social

Presenting our poster at NeurIPS!
Please come chat about estimating causal effects from user/patient-reported experiences: Thursday, 11AM, West Ballroom A-D #5110.

Rahul G. Krishnan @rahulgk.bsky.social · Dec 11

@nikitadhawan.bsky.social developed NATURAL (www.cs.toronto.edu/~nikita/natu...) with @cottascience.bsky.social , Karen & @cmaddis.bsky.social. Its an end-to-end pipeline that starts from raw-text data and ends with a causal (**) effect associated with an intervention.

(**) conditions apply
🧵(6/7)

NATURAL

www.cs.toronto.edu

December 11, 2024 at 5:58 PM

Leonardo Cotta

@cottascience.bsky.social

If you're interested in {causality, language, healthcare}, stop by!
Thursday 11am - 2pm
West Ballroom A-D #5110

Rahul G. Krishnan @rahulgk.bsky.social · Dec 11

@nikitadhawan.bsky.social developed NATURAL (www.cs.toronto.edu/~nikita/natu...) with @cottascience.bsky.social , Karen & @cmaddis.bsky.social. Its an end-to-end pipeline that starts from raw-text data and ends with a causal (**) effect associated with an intervention.

(**) conditions apply
🧵(6/7)

NATURAL

www.cs.toronto.edu

December 11, 2024 at 4:57 PM

Reposted by Leonardo Cotta

Bruno Ribeiro (at #NeurIPS2024)

@brunofmr.bsky.social

I'm told this is a more intellectual version of ML Twitter :). I have a question...

What papers have made good *theoretical* advances towards graph foundation models?

Jan 8th 1-2pm I am giving a talk at the Joint Mathematics Meeting on the topic
meetings.ams.org/math/jmm2025...

<p>Mathematical Foundations of Knowledge Graph Foundation Models</p>

One potential definition of a knowledge graph foundation model is one where a g...

meetings.ams.org

December 11, 2024 at 2:10 PM

Reposted by Leonardo Cotta

Rahul G. Krishnan

@rahulgk.bsky.social

b] ~Billions of dollars each year are spent on trials to assess interventions.

Can we use crowdsourced data to know which intervention is likely to work ahead of time?

Doing so requires answering a causal question!

But the data to answer this question is locked in unstructured text.

🧵(5/7)

December 11, 2024 at 12:20 AM

Reposted by Leonardo Cotta

Polaris

@polarishub.io

What are the most interesting datasets and benchmark-related work for ML in drug discovery at NeurIPS?

We’ll be at the conference doing short interviews with researchers and handing out some Polaris merch!

Here’s who we have on the shortlist. 🧵

December 9, 2024 at 5:09 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news