PMPP is a nicely structured introduction to general purpose GPU programming.
GPUs are fundamentally different from CPUs. This impacts the type of workloads that benefit from them.
If you do pick this up, I recommend working through at least some of the exercises.
20.05.2025 13:31
π 0
π 0
π¬ 0
π 0
* My matmul optimization article
I spend more time on building the intuition for some of the optimizations.
lnkd.in/dRPZmZyM
13.05.2025 11:04
π 0
π 0
π¬ 0
π 0
These are some great resources to learn more:
* How to optimize GEMM
Nice step-by-step introduction by BLIS contributors.
lnkd.in/df6FdX8S
* BLISlab
Goes into much more depth than most introductory sources. Source of the illustration below.
lnkd.in/dHG6akFN
13.05.2025 11:04
π 0
π 0
π¬ 1
π 0
π¦Β Packing
Packing arranges sub-matrix data contiguously. This helps performance and can reduce cache conflicts with large matrices. I cover this more in my article.
13.05.2025 11:04
π 0
π 0
π¬ 1
π 0
π§± Tiling
A fast micro-kernel doesn't help much if it needs to wait for data to be loaded from RAM. Tiling breaks down large matrices into smaller, cache-friendly blocks. The goal is to maximize data reuse from caches.
13.05.2025 11:04
π 0
π 0
π¬ 1
π 0
βοΈΒ Optimized micro-kernel
BLAS libs often use small kernels (e.g., 8x4) optimized for specific CPUs using SIMD intrinsics and software prefetching. Many use handwritten assembly for optimal register allocation.
13.05.2025 11:04
π 0
π 0
π¬ 1
π 0
Matrix-matrix multiplication is a fundamental building block of many scientific and ML applications. What does it take to write efficient one?
π§΅
13.05.2025 11:04
π 0
π 0
π¬ 1
π 0
I've published a new article on CPU-based matrix multiplication optimizations in C.
We'll learn a few things about compilers, read some assembly, and learn about the underlying hardware.
michalpitr.substack.com/p/optimizing...
15.02.2025 20:12
π 1
π 0
π¬ 0
π 0
π I'd probably skip KTHW until you are pretty comfortable with using K8s. But once you are confident and want to understand the cluster bootstrap process, this is a great place to start.
05.01.2025 16:39
π 1
π 0
π¬ 1
π 0
π§ Things can appear working but be broken. I made a mistake setting up pod subnet for one worker node. This meant that pods on node-0 were reachable, but pods on node-1 weren't.
05.01.2025 16:39
π 1
π 0
π¬ 1
π 0
π» Easy to setup a few QEMU VMs on a local machine connected by a virtual bridge. Each VM gets a static IP. This nicely mimics an on-prem setup.
π‘Linux networking is probably the toughest part, but it can be pretty rewarding to debug and understand.
05.01.2025 16:39
π 1
π 0
π¬ 1
π 0
Worked my way through Kubernetes The Hard Way yesterday.
Here're a few impressions from someone familiar with K8s internals but not so much with cluster administration:
05.01.2025 16:39
π 0
π 0
π¬ 1
π 0
Dev containers are real nice for C++ CUDA projects!
No chance of mismatched gpu driver and cuda toolkit, all dependencies auto installed, clean environment that's isolated from host, and super easy to setup on a new machine.
Gonna be a mainstay for all of my future C++/CUDA projects.
03.01.2025 13:23
π 0
π 0
π¬ 0
π 0
Oh wow, got a first pledged subscription to my substack From Scratch.
Not planning to enable payments anytime soon, but it's great to see that folks enjoy my pragmatic engineering deep dives.
#substack #softwareengineering #programming
14.12.2024 13:37
π 0
π 0
π¬ 1
π 0
My recent article on creating a container from scratch in a linux terminal popped off and doubled my subscriber count!
09.12.2024 15:28
π 1
π 0
π¬ 1
π 0
Linux container from scratch
Let's build a minimal container step-by-step in a terminal
Do you understand how Linux containers work? Are cgroups and namespaces just magical blackboxes? Are containers just "light-weight VMs"?
In my latest post, I create a Linux container, step-by-step, using just terminal commands!
#softwareengineering #linux #cloud
open.substack.com/pub/michalpi...
07.12.2024 18:17
π 0
π 0
π¬ 0
π 0
Sharing my code since talk is cheap: github.com/MichalPitr/a...
04.12.2024 22:35
π 0
π 0
π¬ 0
π 0
Day 3 - Advent of Code 2024
Really liked day 3 #AdventOfCode problem! I approached it by writing a simple tokenizer that converts the input file into tokens - identifier, left parenthesis, comma, right paren, number, ..., then looked for legal sequences. Good fun!
adventofcode.com/2024/day/3
03.12.2024 22:19
π 2
π 0
π¬ 1
π 0
GPU Programming
Writing code for massively parallel processors
Ever wondered how to execute code on a GPU?
Wrote a hands-on short post on how to write a simple function that executes on a GPU in C CUDA.
open.substack.com/pub/michalpi...
#softwareengineering #machinelearning #ai #gpgpu
28.11.2024 09:25
π 1
π 0
π¬ 0
π 0
Today GKE team celebrated the launch of 65k node clusters!
My favorite bit is that the team had to replace etcd with spanner to overcome scaling issues. β‘
Curious to see these massive TPU/GPU training clusters in action!
26.11.2024 20:53
π 1
π 0
π¬ 0
π 0
Build Your Own Inference Engine: From Scratch to "7"
Building a C++ Inference Engine from scratch
Crossed 300 subs on substack!
My publication focuses on intuitive hands-on explanations of complex software topics - usually after I build something like an ML inference engine from scratch.
Why not check it out?
open.substack.com/pub/michalpi...
19.11.2024 20:37
π 0
π 0
π¬ 0
π 0
Primer on Linux container filesystems
Building a container filessytem by hand
Wrote a practical article summarizing what I learned about Linux container filesystems by building a Docker clone from scratch!
We'll reverse engineer how Docker handles it, then discuss overlayfs, and finally use it to setup an Alpine-based container filesystem.
open.substack.com/pub/michalpi...
16.11.2024 17:42
π 2
π 0
π¬ 0
π 0
How does SQLite store data?
What I learned by implementing (parts) of SQLite from scratch.
Ever wondered how SQLite stores data?
I wrote a hands-on post where I explain how SQLite represents data on disk. I manually navigate the B-Tree with a hex editor to find a specific entry.
Come along!
open.substack.com/pub/michalpi...
14.11.2024 18:53
π 0
π 0
π¬ 0
π 0