HGPU group
banner
hgpu.bsky.social
HGPU group
@hgpu.bsky.social
High performance computing on graphics processing units (GPU): AMD, Nvidia, Intel, CUDA, OpenCL, OpenGL, HPC
Characterizing the Performance of Parallel Data-Compression Algorithms across Compilers and GPUs

#CUDA #HIP #Compression #Package

hgpu.org?p=30342
Characterizing the Performance of Parallel Data-Compression Algorithms across Compilers and GPUs
Different compilers can generate code with notably different performance characteristics – even on the same system. Today, GPU developers have three popular options for compiling CUDA or HIP …
hgpu.org
November 9, 2025 at 4:28 PM
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error

#FP8 #Precision

hgpu.org?p=30341
FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error
Training large Mixture-of-Experts (MoE) models remains computationally prohibitive due to their extreme compute and memory demands. Although low-precision training promises to accelerate computatio…
hgpu.org
November 9, 2025 at 4:28 PM
A Study of Floating-Point Precision Tuning in Deep Learning Operators Implementations

#CUDA #DeepLearning #DL #Package

hgpu.org?p=30330
A Study of Floating-Point Precision Tuning in Deep Learning Operators Implementations
Deep learning (DL) has already played a significant role in numerous fields, making it crucial to ensure the stability of both training and inference in DL systems. The computation of DL models can…
hgpu.org
November 2, 2025 at 4:05 PM
A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines

#AMD #FPGA #CodeGeneration #AI

hgpu.org?p=30316
A Compute Graph Simulation and Implementation Framework Targeting AMD Versal AI Engines
We present a framework for developing compute graph-based applications targeting the AI Engine (AIE) array of AMD Versal SoCs. This framework enables users to embed AIE-based dataflow graph prototy…
hgpu.org
October 26, 2025 at 8:03 PM
Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

#SYCL #HIP #CUDA #Performance #Package

hgpu.org?p=30304
Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation
Specializing kernels by including runtime information during just-in-time (JIT) -compilation can improve performance at the expense of potentially generating more kernels. In this work, we contribu…
hgpu.org
October 19, 2025 at 8:40 PM
Thesis: High-Performance Computing: from Optimization to Automation

#CUDA #HIP #HPC

hgpu.org?p=30292
High-Performance Computing: from Optimization to Automation
The digital revolution of our society is driven by major technological advancements, enabled not only by the growing capabilities of computers but also by the evolution of their uses. These develop…
hgpu.org
October 12, 2025 at 2:49 PM
Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

#MLIR #OpenCL #Testing #Package

hgpu.org?p=30291
Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR
MLIR (Multi-Level Intermediate Representation) has rapidly become a foundational technology for modern compiler frameworks, enabling extensibility across diverse domains. However, ensuring the corr…
hgpu.org
October 12, 2025 at 2:48 PM