Benedikt Riedel
htc-hpc.bsky.social
Benedikt Riedel
@htc-hpc.bsky.social
Making computers discovery neutrinos
Not agreeing with the actions. The dataset itself is skewed. The bulk of NSF awards are 3 years long (NSF prides itself in this). Most Trump 1.0 grants have completed or are in NCE at times point.
May 3, 2025 at 11:18 PM
Isn’t FY25 kind of done deal with the CR that passed in March?
April 9, 2025 at 1:06 AM
Is it run by grown-ups if it ends the same way as WeWork?
March 27, 2025 at 9:48 PM
Bulk data storage object stores are great. In my experience taking the POSIX layer away from users is difficult without significant investment into the metadata service (as you pointed out in the blog).
This AWS blog post wants you to use Lustre for ML training:
aws.amazon.com/blogs/machin...
Scaling distributed training with AWS Trainium and Amazon EKS | Amazon Web Services
Recent developments in deep learning have led to increasingly large models such as GPT-3, BLOOM, and OPT, some of which are already in excess of 100 billion parameters. Although larger models tend to ...
aws.amazon.com
February 2, 2025 at 10:58 PM
Really good read! Is this specific to LLMs? There has been a lot of investment into getting parallel file systems into the cloud, so what they are good for? "Classic" HPC workloads?
February 2, 2025 at 10:00 PM
I gotta ask. Have the investment decisions been affected by things like Deepseek, the lowering cost of inference or the next step in AI needing more dev time than compute time?
January 4, 2025 at 5:15 AM
Ground News might be a good aggregator
December 27, 2024 at 7:54 PM
More on the point… GEANT exists, ESNet exists. The issue is mostly security and setup on the DOE-side. Europe has the same security issues as DOE. If they can’t trust each other over a network that they own and operate, I don’t see this going far beyond setting up Globus endpoints or the equivalent
December 17, 2024 at 11:13 PM
The biggest hurdle with IRI will be the security aspects and technical choices. I don’t see Aurora or Frontier turning into a Perlmutter-like system with IRI, but they should. NAIRR is separate from ACCESS really. NAIRR and ACCESS use the same resources, but not much else.
December 17, 2024 at 11:08 PM
TeraGrid and XSEDE fell to the NSF “rule” that after 10 years you need to be a new project. ACCESS is the current iteration and it being used. ACCESS has been stripped down though significantly from TeraGrid and XSEDE. It mostly handles allocations, user portal, and metrics now. access-ci.org
Home - Access
Following its highly successful Extreme Science and Engineering Discovery Environment (XSEDE) project, the National Science Foundation (NSF) is excited to introduce new advances in innovative cyberinf...
access-ci.org
December 17, 2024 at 11:05 PM
I can’t just mount the necessary shared objects from the system I have to install them myself in the container. Again I haven’t spent more than a couple days on this, so there may be a solution to all of this but not documented (yet).
December 2, 2024 at 10:54 PM
The container support is also not great. For example, there appears to be a difference between the packages on RHEL and Ubuntu. I can get Ubuntu containers to work but not RHEL. Great if you do ML not great if you don’t.
December 2, 2024 at 10:54 PM
There appear multiple copies of shared objects whose distinction isn’t clear. For example there is a libOpenCL.so in multiple places but the OpenCL shared object for GPUs has a different name (libigdrcl.so).
December 2, 2024 at 10:54 PM
This is a work in progress and we use OpenCL for the accelerated code so I understand that the support isn’t a first citizen. There are so many packages for OneAPI: 100+. OneAPI packages create a lot of directories (a good fraction that are empty) as well.
December 2, 2024 at 10:54 PM
Underneath it is, but Intel has extended it and filed off the edges as far as I am aware.
December 2, 2024 at 8:06 PM
1+ million dying and millions more disabled from an airborne disease didn’t change people’s mind. Do you really think a loved one getting sick will?
December 2, 2024 at 6:46 PM
Maybe OneAPI will follow DAOS into it’s own foundation. I mean from what I heard at SC, Intel GPU deployments are underutilized. And from playing with OneAPI it gets messy quickly.
December 2, 2024 at 6:43 PM
The package “todonotes” might also be handy
November 27, 2024 at 1:20 AM
It really is 18 month cycle with the House/Senate election cycle.
November 20, 2024 at 8:53 PM
Academia tends to hold onto ideas that might not be readily commercialized in the beginning, see mRNA vaccines.
November 20, 2024 at 8:50 PM
“Because leadership is interested in science” is one line I heard today. Industry also tends to glom on to a certain tech (see the LLM craze) and seemingly forgets all the other use cases, except maybe in their respective niche, that could be commercialized.
November 20, 2024 at 8:48 PM
We were in the B302 corner
November 18, 2024 at 8:58 PM
Over on the workshop side we only had unsweet tea
November 18, 2024 at 8:41 PM
It sounded like Arc is having issues. At least their Win versions seems to circling the drain. I have been using at sigma.os as a replacement
November 18, 2024 at 7:10 PM
An iPhone does a pretty good job at the pedometer. For all the other metrics you would need a ring
November 16, 2024 at 10:34 PM