Gordon Forbes
banner
gforb.bsky.social
Gordon Forbes
@gforb.bsky.social
I love Netflix for their data science blog and The BBC for their ggplot2 resources.
How is AI going to make this all easier?

And relatedly, how can we stop the increase in productivity form AI leading to an overwhelming ammount of methedologicaly questionable research.
July 30, 2025 at 2:08 PM
To me this is what age as catagorical means - eg. age 0-18, 19 - 65, 65+.

I would describe using individual integer ages as (ie. 1,2,3,4,5,6,...) as descrete age.

The 'big catagory' approach is used worryingly often - sometimes due to restrictions on the data.
July 8, 2025 at 2:40 PM
Hard disagree! It is acceptable and probably preferable for a group to write a paper without understanding the details of each other's work.

eg. I want to be able to write "the model was estimated with restricted maximum likelihood" in a stats section without explaining REML to my collaborators.
March 28, 2025 at 12:47 PM
I totally agree that, theoretically, it makes no sense to shrink parameters to zero.

In practice, if it means a model can be applied without collecting mostly irrelevant data, this can be a huge win. Especially in health when that extra data can involve invasive or expensive tests.
March 13, 2025 at 9:14 AM
Surely, do what is computationally feasible.

For a random forest on a medium-sized data set, there is no reason not to use a resampling approach or cross-validation.

If you have just developed an LLM using data scraped from the whole Internet, you are not going to be running cross-validation.
March 7, 2025 at 10:37 AM
When I've worked with REDCap databases I've always had to rely on a data manager pulling an extract and emailing it to me.

This could be a game changer.
March 6, 2025 at 9:53 AM