LLMs on Strix Halo: Three Days Chasing the MES Firmware 0x83 Bug
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
5 articles
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
I set up Strix Halo as a k3s worker via Incus VM + VFIO, then hit a wall: once the GPU enters a dirty state, recovery is impossible without bare metal.
device-plugin gives the GPU to one Pod at a time. Here's why I switched to DRA on k3s, and three Strix Halo-specific issues I had to patch around.
How I joined GMKtec EVO-X2 (Ryzen AI MAX+ 395) to my k3s cluster as a GPU node via Incus VFIO, covering APU-specific passthrough gotchas.
Master Stable Diffusion WebUI setup with Docker. This guide covers installation steps, usage tips, and a comparison of anime-style vs realistic models.