Lightnews — Scholar-powered news

Falvyu

@falvyu.bsky.social

PhD | French | Hardware-aware Algorithm design | Image Processing | HPC

SIMD-friends: #SSE, #AVX512, #NEON, #RVV
(Opinions are my own)

Posts Replies Media Videos

Falvyu

@falvyu.bsky.social

I don't know anything about Raku, but these ads/posters are quite cool. #FOSDEM

February 1, 2025 at 5:16 PM

Falvyu

@falvyu.bsky.social

Hello Brussels ! #FOSDEM

January 31, 2025 at 2:06 PM

Falvyu

@falvyu.bsky.social

Some neat tricks for computing bit-wise prefix-or and segmented-prefix-or within scalar registers.

December 29, 2024 at 1:57 AM

Falvyu

@falvyu.bsky.social

It turns that the 'unrolled loop' for the 64-bits bitwise-or segmented scan can actually be reduced down to a few instructions.
(not sure if that's a known trick)

Note: compilers optimize `((~mreset) | v)` so that it is only computed once.

December 28, 2024 at 11:26 PM

Falvyu

@falvyu.bsky.social

*screams internally*

December 26, 2024 at 1:51 AM

Falvyu

@falvyu.bsky.social

I have the RLE bandwidth (in giga-pixels), and cycles-per-pixels.
Here, the cpp is derived from the duration & frequency (i.e. it's meant to be a rough approximation of 'hardware cycles').
I do have 'proper cycles' measurements, but not at this level of granularity.

December 6, 2024 at 11:38 PM

Falvyu

@falvyu.bsky.social

A closer look at one of the 'main' SIMD part, which is a Run Length Encoding algorithm (8-bits pixels => 16-bits segments) show promising results for RVV (even if AVX512 and Neon on the M1 provide a larger speedup).

December 6, 2024 at 1:48 AM

Falvyu

@falvyu.bsky.social

A 'rough' determination of the efficiency of each SIMD implementation can be done by measuring them against their scalar versions.
It is worth noting that only portions of the algorithm have been vectorized, and the performance of non-SIMD parts (which can be branch-heavy) will also vary.

December 6, 2024 at 1:48 AM

Falvyu

@falvyu.bsky.social

A performance comparison against SotA algorithms show good results (note: not *just* because of SIMD, other transformations have also been proposed).

December 6, 2024 at 1:48 AM

Falvyu

@falvyu.bsky.social

I defended my PhD last Friday (design of efficient data-dependent image processing algorithms).

I'd like to thank everyone who has been involved in it (jury members, advisors, colleagues, and of course my family).
This has been a long adventure, and I'm now looking forward to the next thing.

December 3, 2024 at 6:31 PM

Falvyu

@falvyu.bsky.social

MFW

55 gtihub commits on November for LaTeX documents

November 27, 2024 at 8:03 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news