LLMs on Strix Halo: Three Days Chasing the MES Firmware 0x83 Bug
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
21 articles
Running llama.cpp on my k3s + AMD GPU cluster kept hitting memory access faults. The culprit: a bug in MES firmware 0x83 shipped with amdgpu-dkms-firmware.
I set up Strix Halo as a k3s worker via Incus VM + VFIO, then hit a wall: once the GPU enters a dirty state, recovery is impossible without bare metal.
device-plugin gives the GPU to one Pod at a time. Here's why I switched to DRA on k3s, and three Strix Halo-specific issues I had to patch around.
How I joined GMKtec EVO-X2 (Ryzen AI MAX+ 395) to my k3s cluster as a GPU node via Incus VFIO, covering APU-specific passthrough gotchas.
Migrating from MicroK8s to K3s. Real-world insights on infrastructure rebuilding, from an Ubuntu 26.04 twist to Kubeconfig traps and safe TLS switching.
How I built a Discord BOT in Go to securely interact with my private homelab server without exposing it to the internet.
A guide to building a disposable remote dev environment on homelab Kubernetes with Coder to keep your PC clean, and its relevance in the AI agent era.
Pragmatic homelab backups using TrueNAS and Google Drive. A 3-tier storage architecture using Cloud Sync PUSH, prioritizing cost over strict 3-2-1 rules.
Migrating homelab from VMs to Incus (LXC) + K3s. Solve MicroK8s challenges and set up optimized container orchestration with our step-by-step Incus guide.
Migrated from Authentik to lightweight PocketID. Exploring passwordless Passkeys and OIDC integration patterns on Kubernetes.
Migration from Ingress Nginx to Traefik v3 and Gateway API. Covers Helm setup, the shift to HTTPRoute, and why Gateway API is preferred over IngressRoute.
A manifesto of a home server packed with my 'favorites'. Introducing a suite of applications to hack life with code.