Chris Dagdigian
banner
dagdigian.com
Chris Dagdigian
@dagdigian.com
Life science informatics, Research IT & HPC infrastructure geek, #AWS cloud nerd & accidental entrepreneur @ http://bioteam.net. He/him.

Find me in Boston or Western Maine (Bethel area), where I'm transitioning from nerd to Tractor Guy / meadow planter
... but not as much money as owning and depreciating premise HPC and storage and running those workloads in a colo or on-prem.

It's the monthly cloud charges for provisioned storage that are killers at petabyte+ scale
June 28, 2025 at 10:06 PM
Specific win in my niche is for CroyoEM where it is common to ingest & persist petabytes of data but only run analytical pipelines during certain defined periods.

Keeping data in standard tier S3 storage class and only creating a POSIX filesystem (and HPC cluster!) when needed saves lots of $$
June 28, 2025 at 10:04 PM
biggest thing on-prem: not paying every month for the full raw storage!

For PB scale stuff the cloud math is scary. Keeping it in S3 and making lustre "when needed" has become semi viable for certain workloads in my niche. Contorting infra to keep cost down has diminishing returns ...
June 28, 2025 at 7:52 PM
the link between fsx/lustre & S3 via DRAs is magic - being able to create parallel filesystems off of a bucket or bucket prefix & then persist changes & POSIX stuff into S3 so you can destroy the lustre filesystem when the pipeline completes is transformative. My favorite AWS thing at the moment!
June 28, 2025 at 7:46 PM
Reposted by Chris Dagdigian
June 5, 2025 at 5:51 PM
@tgamblin.bsky.social -- thanks for the hint! Changing to curl for downloads was far easier than the other ideas I had
April 21, 2025 at 12:53 PM