Gabriel's Avatar

Gabriel

@dssgabriel

PhD candidate, HPC Software Engineering @cea.fr / DAM MSc HPC & Simulation from @univparissaclay.bsky.social Architecture, microbenchmarking & SIMD sorcery. Research on distributed computing, data structures & memory layouts at exascale. RTFM πŸ‘Ή

20
Followers
87
Following
19
Posts
01.04.2025
Joined
Posts Following

Latest posts by Gabriel @dssgabriel

NVIDIA Adds Official Support For RHEL-Compatible Distributions Like AlmaLinux With CUDA 13.2 With CUDA 13.2 that is now shipping, NVIDIA has provided official support for Red Hat Enterprise Linux compatible distributions/downstreams like AlmaLinux to CUDA. With this official NVIDIA CUDA support for these RHEL-compatible distributions, NVIDIA is also allowing the NVIDIA packages to be distributed directly from the OS package repositories...

NVIDIA Adds Official Support For RHEL-Compatible Distributions Like AlmaLinux With CUDA 13.2 - https://www.phoronix.com/news/NVIDIA-Official-RHEL-Compat

09.03.2026 18:55 πŸ‘ 10 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Defer available in gcc and clang About a year ago I posted about defer and that it would be available for everyone using gcc and/or clang soon. So it is probably time for an update. Two things have happened in the mean time: A tec…

Defer available in GCC and Clang
gustedt.wordpress.com/2026/02/15/d...

20.02.2026 13:53 πŸ‘ 0 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Post image

Join us for the HPSF Community Summit 2026 in Braunschweig, Germany, February 25-27! πŸ’š

Learn what’s new with HPSF projects, give us feedback on your use of HPSF software, meet with project communities, and tell us how to grow and improve them.

Details: hpsf.io/event/hpsf-c...

28.01.2026 15:45 πŸ‘ 2 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Intro to the GitButler CLI
Intro to the GitButler CLI YouTube video by GitButler

Love the GitButler GUI but miss your CLI? Have we got the solution for you!

youtu.be/Jg8L3SbgZ3o?...

11.02.2026 08:05 πŸ‘ 11 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Release v0.37.0 Β· jj-vcs/jj About jj is a Git-compatible version control system that is both simple and powerful. See the installation instructions to get started. Release highlights A new syntax for referring to hidden and...

#jj-vcs 0.37.0 came out yesterday! im intrigued by the new divergent change syntax, seems very neat

github.com/jj-vcs/jj/re...

08.01.2026 19:42 πŸ‘ 62 πŸ” 8 πŸ’¬ 4 πŸ“Œ 2

Please note: Any claims of AI Exascale, AI Zettascale or beyond computing power are just baloney. Real computing power is measured in FP64. Period.

AMD embraced utter stupidity by adopting this terminology by the leather jacket man.

It's a really shame!

#CES2026 #AMD

06.01.2026 03:08 πŸ‘ 8 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0
LLVM 22 Lands NVIDIA Olympus CPU Scheduling Model NVIDIA's Olympus are the ARM64 cores found within the upcoming Vera CPU that will be paired with Rubin. Olympus cores are claimed to be twice as fast as NVIDIA's current CPU cores found in Grace and based on Neoverse-V2. Earlier this year the open-source compilers landed initial support for Olympus while now a proper CPU scheduling model has been upstreamed into LLVM 22...

LLVM 22 Lands NVIDIA Olympus CPU Scheduling Model - https://www.phoronix.com/news/NVIDIA-Olympus-Sched-Model

30.12.2025 14:59 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
title slide of talk being given at Rust Nation UK:

[Title] Rust for Foundational SW or: Safety-Critical Software in Rust

title slide of talk being given at Rust Nation UK: [Title] Rust for Foundational SW or: Safety-Critical Software in Rust

ever curious why people that work in safety-critical systems want to use Rust?

here's the title slide for the talk i'll give at @rustnationuk.bsky.social about this

28.12.2025 23:37 πŸ‘ 27 πŸ” 8 πŸ’¬ 2 πŸ“Œ 1
When compilers surprise you β€” Matt Godbolt’s blog Sometimes compilers can surprise and delight even a jaded old engineer like me

Day 24 of #AoCO2025! A loop summing 0+1+2+...+n. GCC unrolls it. Clang does something jaw-dropping: the loop vanishes entirely, replaced by a direct calculation. How?!

xania.org/202512/24-cu...
youtu.be/V9dy34slaxA

24.12.2025 13:21 πŸ‘ 31 πŸ” 4 πŸ’¬ 3 πŸ“Œ 1

Day 23 of #AoCO2025! Switch β†’ jump table? Sometimes. Other times: arithmetic, bitmasks, or something cleverer. Compilers have more tricks than you think.

xania.org/202512/23-sw...
youtu.be/aSljdPafBAw

23.12.2025 12:35 πŸ‘ 22 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Clever memory tricks β€” Matt Godbolt’s blog We learn that compilers have tricks to access memory efficiently

Day 22: String comparison against "ABCDEFG" should call memcmp, but Clang inlines it with some clever memory tricks. How does it compare 7 bytes so efficiently? xania.org/202512/22-me... youtu.be/kXmqwJoaapg #AoCO2025

22.12.2025 12:57 πŸ‘ 25 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
When SIMD Fails: Floating Point Associativity β€” Matt Godbolt’s blog Why floating point maths doesn't vectorise like integers, and what to do about it

Day 21: Summing integers? Compiler vectorises beautifullyβ€”8 at a time! Switch to floats? It refuses, doing each add individually. Same code, totally different output. Why? πŸ€”

xania.org/202512/21-ve...
youtu.be/lUTvi_96-D8

#AoCO2025

21.12.2025 13:34 πŸ‘ 23 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0

Day 20: Process 65,536 integers one at a time? Nah. The compiler vectorises it to handle 8 at once β€” same code, 8Γ— faster! SIMD auto-vectorisation is compiler magic πŸš€

xania.org/202512/20-si...
youtu.be/d68x8TF7XJs #AoCO2025

20.12.2025 12:58 πŸ‘ 27 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Chasing your tail β€” Matt Godbolt’s blog The art of not (directly) coming back: tail call optimisation

Day 19: Recursive functions calling themselves endlessly β€” stack growth? Nope! The compiler turns recursion into loops. Tail call optimisation is magic ✨

xania.org/202512/19-ta...
youtu.be/J1vtP0QDLLU #AoCO2025

19.12.2025 13:01 πŸ‘ 25 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0
Partial inlining β€” Matt Godbolt’s blog Inlining doesn't have to be all-or-nothing

Day 18: Function with fast & slow paths. Inline = code bloat. Don't inline = slow fast path. Can't have bothβ€”or can you? The compiler finds a surprising way out of this dilemma.

xania.org/202512/18-pa...
youtu.be/STZb5K5sPDs
#AoCO2025

18.12.2025 13:05 πŸ‘ 27 πŸ” 4 πŸ’¬ 0 πŸ“Œ 1

Actually, this die configuration is not new information, it was already mentioned on this removed slide:
(Although the CPU die's CBB name is seems still new.)

18.12.2025 10:47 πŸ‘ 1 πŸ” 2 πŸ’¬ 0 πŸ“Œ 1
Preview
Deferred Conflict (with Steve Klabnik) | Dead Code A podcast about how the software industry got this way

Listen: shows.acast.com/dead-code/e...

17.12.2025 16:15 πŸ‘ 5 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

It’s safe to assume that the HPC scheduling space is going to be in a state of Flux for quite some time to come…

(I see what I did there. With consummate apologies to @vsoch.bsky.social and @tgamblin.bsky.social in advance 🀣)

17.12.2025 17:56 πŸ‘ 5 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Video thumbnail

How have servers and the cloud evolved in the last 30 years, and what might be next? @bcantrill.bsky.social has been at the thick of the industry since the Dotcom Boom, and shares fascinating stories.

Bryan is one of my all-time favorite people to talk with - don't miss this one.

(cont'd)

17.12.2025 21:02 πŸ‘ 59 πŸ” 7 πŸ’¬ 3 πŸ“Œ 1
Inlining - the ultimate optimisation β€” Matt Godbolt’s blog Copy paste can sometimes be a good thing, at least if the compiler does it for you

Day 17: Inlining β€” the ultimate optimisation ✨

A function gets inlined, half vanishes. The assembly is cleaner than hand-written. How does copy-paste make code disappear?

xania.org/202512/17-in...
youtu.be/JFHfFTvMPp0

#AoCO2025

17.12.2025 12:27 πŸ‘ 20 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Calling all arguments β€” Matt Godbolt’s blog Knowing how compilers call functions can help with design - and optimisation

Day 16: Calling conventions matter! Pass 8 chars as separate args: stack spillage. Pack them in a struct: single register. Sometimes structs are MORE efficient than separate parameters!

xania.org/202512/16-ca...
youtu.be/Yaw8AMoP4sI
#AoCO2025

16.12.2025 12:59 πŸ‘ 43 πŸ” 5 πŸ’¬ 2 πŸ“Œ 0
Aliasing β€” Matt Godbolt’s blog Knowing when the compiler can't optimise is important too

Day 15: Two nearly identical loopsβ€”one writes to memory every iteration, the other stays in registers. Same code, wildly different performance. The culprit? Aliasing!

xania.org/202512/15-al...
youtu.be/PPJtJzT2U04

#AoCO2025

15.12.2025 13:20 πŸ‘ 28 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0

Does this mean no more dirt-cheap NRE from Slurm? Or will Slurm development no longer be coin-operated? Would love to see serious engineering effort go into modernizing Slurm, but this could go in many directions.

15.12.2025 17:40 πŸ‘ 0 πŸ” 1 πŸ’¬ 2 πŸ“Œ 0
When LICM fails us β€” Matt Godbolt’s blog When aliasing can prevent loop-invariant code motion

Day 14: Add ONE global counter to your loop and watch LICM vanishβ€”strlen called every iteration! Why would incrementing an unrelated variable break the optimisation? πŸ€”

xania.org/202512/14-li...
youtu.be/OwFNblEEAXo
#AoCO2025

14.12.2025 13:10 πŸ‘ 29 πŸ” 4 πŸ’¬ 1 πŸ“Œ 0
Loop-Invariant Code Motion β€” Matt Godbolt’s blog The compiler can move code outside of loops to speed things up

Day 13 of Advent of Compiler Optimisations! πŸ”„

Loop calling a function whose result never changes? One compiler hoists it out automatically. The other… doesn't. Even with hints!

xania.org/202512/13-li...
youtu.be/dIwaqJG0WDo

#AoCO2025

13.12.2025 13:08 πŸ‘ 16 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Pointer Arith (Using the GNU Compiler Collection (GCC)) Pointer Arith (Using the GNU Compiler Collection (GCC))

Cursed code:

void* f(void *p) {
return p + 1;
}

Both gcc and clang support void* arithmetic as an extension in C:

gcc.gnu.org/onlinedocs/g...

-pedantic FTW!

Godbolt: godbolt.org/z/rcrqWvMGW

#Programming

12.12.2025 17:57 πŸ‘ 4 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Unswitching loops for fun and profit β€” Matt Godbolt’s blog Duplicating loops around can yield some decent optimisations

Day 12 of Advent of Compiler Optimisations! A loop that checks the same thing every time. The compiler's solution? Make the code bigger to make it faster. Wait, what? xania.org/202512/12-lo... youtu.be/-VCrYshE7iQ #AoCO2025

12.12.2025 12:46 πŸ‘ 25 πŸ” 3 πŸ’¬ 0 πŸ“Œ 1

Day 11: A clever bit-counting loop using the "clear bottom bit" trick. Change one compiler flag and... wait, what just happened to my loop?! Pattern recognition at its finest.

xania.org/202512/11-po...
youtu.be/Hu0vu1tpZnc
#AoCO2025

11.12.2025 13:11 πŸ‘ 29 πŸ” 5 πŸ’¬ 2 πŸ“Œ 1
Post image

Kokkos 5.0 is officially out. ✨

Details:
- Moves the project to C++20
- Retires older interfaces, reducing complexity for future work
- Ideal time for teams to review workflows

Read the full update here: hpsf.io/blog/2025/ko...

11.12.2025 17:39 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Where did you guys get the info for the facility power draw and cooling limits? πŸ‘€ Was it publicly announced somewhere?

10.12.2025 22:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0