Lightnews — Scholar-powered news

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

#SYCL #HIP #CUDA #Performance #Package

hgpu.org?p=30304

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Specializing kernels by including runtime information during just-in-time (JIT) -compilation can improve performance at the expense of potentially generating more kernels. In this work, we contribu…

hgpu.org

October 19, 2025 at 8:40 PM

HGPU group

@hgpu.bsky.social

Anonymized Network Sensing using C++26 std::execution on GPUs

#CUDA #CXX

hgpu.org?p=30303

Anonymized Network Sensing using C++26 std::execution on GPUs

Large-scale network sensing plays a vital role in network traffic analysis and characterization. As network packet data grows increasingly large, parallel methods have become mainstream for network…

hgpu.org

October 19, 2025 at 8:40 PM

HGPU group

@hgpu.bsky.social

A Performance Portable Matrix Free Dense MTTKRP in GenTen

#Kokkos #CUDA #OpenMP #Package

hgpu.org?p=30302

A Performance Portable Matrix Free Dense MTTKRP in GenTen

We extend the GenTen tensor decomposition package by introducing an accelerated dense matricized tensor times Khatri-Rao product (MTTKRP), the workhorse kernel for canonical polyadic (CP) tensor de…

hgpu.org

October 19, 2025 at 8:40 PM

HGPU group

@hgpu.bsky.social

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

#CUDA #ROCm #Performance #DeepLearning #DL #Package

hgpu.org?p=30301

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Operator fusion has become a key optimization for deep learning, which combines multiple deep learning operators to improve data reuse and reduce global memory transfers. However, existing tensor c…

hgpu.org

October 19, 2025 at 8:35 PM

HGPU group

@hgpu.bsky.social

Thesis: High-Performance Computing: from Optimization to Automation

#CUDA #HIP #HPC

hgpu.org?p=30292

High-Performance Computing: from Optimization to Automation

The digital revolution of our society is driven by major technological advancements, enabled not only by the growing capabilities of computers but also by the evolution of their uses. These develop…

hgpu.org

October 12, 2025 at 2:49 PM

HGPU group

@hgpu.bsky.social

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

#MLIR #OpenCL #Testing #Package

hgpu.org?p=30291

Interleaved Learning and Exploration: A Self-Adaptive Fuzz Testing Framework for MLIR

MLIR (Multi-Level Intermediate Representation) has rapidly become a foundational technology for modern compiler frameworks, enabling extensibility across diverse domains. However, ensuring the corr…

hgpu.org

October 12, 2025 at 2:48 PM

HGPU group

@hgpu.bsky.social

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

#CUDA #CodeGeneration #LLM #DeepLearning #DL #Package

hgpu.org?p=30290

ConCuR: Conciseness Makes State-of-the-Art Kernel Generation

GPU kernel generation by LLMs has recently experienced rapid development, leveraging test-time scaling and reinforcement learning techniques. However, a key challenge for kernel generation is the s…

hgpu.org

October 12, 2025 at 2:48 PM

HGPU group

@hgpu.bsky.social

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

#OpenMP #HPC #Astrophysics #Package

hgpu.org?p=30289

Accelerating cosmological simulations on GPUs: a portable approach using OpenMP

In this work we present the porting to Graphics Processing Units (GPUs, using OpenMP target directives) and optimization of a key module within the cosmological {pinocchio} code, a Lagrangian Pertu…

hgpu.org

October 12, 2025 at 2:47 PM

HGPU group

@hgpu.bsky.social

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

#CUDA #LLM #AI #DeepLearning #DL #PyTorch

hgpu.org?p=30288

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promis…

hgpu.org

October 12, 2025 at 2:47 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news