LLMs on Strix Halo: Three Days Chasing the MES Firmware 0x83 Bug
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
7 articles
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
I set up Strix Halo as a k3s worker via Incus VM + VFIO, then hit a wall: once the GPU enters a dirty state, recovery is impossible without bare metal.
device-plugin gives the GPU to one Pod at a time. Here's why I switched to DRA on k3s, and three Strix Halo-specific issues I had to patch around.
How I joined GMKtec EVO-X2 (Ryzen AI MAX+ 395) to my k3s cluster as a GPU node via Incus VFIO, covering APU-specific passthrough gotchas.
Migrating from MicroK8s to K3s. Real-world insights on infrastructure rebuilding, from an Ubuntu 26.04 twist to Kubeconfig traps and safe TLS switching.
How I built a Discord BOT in Go to securely interact with my private homelab server without exposing it to the internet.
How I built a self-healing, automated refactoring pipeline using Codex's subscription capacity and Temporal on a home Kubernetes cluster.