The last post ended with me bailing on VFIO and running k3s directly on bare-metal Ubuntu 26.04. The GPU dirty-state-after-crash problem was unrecoverable inside a VM, so I just moved everything to bare metal. With that out of the way, I could finally deal with the GPU management layer itself.
Starting Point: “Why Not Just Use device-plugin?”
The usual way to expose AMD GPUs in Kubernetes is the device-plugin. The configuration is minimal: request amd.com/gpu: "1" in your Pod spec and you’re done.
resources: limits: amd.com/gpu: "1"The problem is that the GPU becomes exclusive to that Pod. In a homelab with a single GPU, everything else stays Pending.
In my setup, lemonade — a llama.cpp-based inference server — and ComfyUI for image generation both need GPU access. An agent can trigger image generation mid-inference, so simultaneous GPU access does happen. With device-plugin, only one can run at a time.
Why DRA
DRA (Dynamic Resource Allocation) landed as alpha in Kubernetes 1.26 and hit beta in 1.32. Instead of integer resource counting, it models resources as ResourceClaim objects — and multiple Pods can reference the same claim.
apiVersion: resource.k8s.io/v1kind: ResourceClaimmetadata: name: amd-gpuspec: devices: requests: - name: gpu exactly: deviceClassName: gpu.amd.com allocationMode: ExactCount count: 1
resource.k8s.io/v1requires Kubernetes 1.34+. On older k3s releases, you’ll needv1beta1orv1beta2instead.
Pods reference it via resourceClaimName:
spec: resourceClaims: - name: gpu resourceClaimName: amd-gpu containers: - name: lemonade resources: claims: - name: gpuBoth lemonade and ComfyUI reference the same amd-gpu ResourceClaim. Kubernetes wires up device access; VRAM management is left entirely to each app. There’s no isolation — it works because 96 GB of unified memory gives enough headroom even when both are running simultaneously.
Switching in gpu-operator is two lines in values.yaml:
deviceConfig: spec: devicePlugin: enableDevicePlugin: false draDriver: enable: true image: ghcr.io/tamara1031/k8s-gpu-dra-driver:v0.2.0That image is my fork, not upstream. Here’s why.
Three Strix Halo Gotchas
NFD Doesn’t Detect the GPU
gpu-operator v1.5.0’s Node Feature Discovery rules don’t include the Strix Halo PCI device ID 0x1586. Without that label, the DRA driver DaemonSet never schedules. The GPU is physically present but Kubernetes treats it as nonexistent.
I added a NodeFeatureRule to the Helm chart:
apiVersion: nfd.k8s-sigs.io/v1alpha1kind: NodeFeatureRulemetadata: name: amd-gpu-strix-halo namespace: kube-amd-gpuspec: rules: - name: amd-gpu-strix-halo labels: feature.node.kubernetes.io/amd-gpu: "true" matchAny: - matchFeatures: - feature: pci.device matchExpressions: device: op: In value: - "1586" # Strix Halo (gfx1151) — not in gpu-operator v1.5.0 vendor: op: In value: - "1002"Check the latest release notes before adding this — it may have been included upstream since v1.5.0.
driverVersion Fails Semver Validation
The DRA driver writes GPU metadata to a ResourceSlice and publishes it to the kubelet. The driverVersion field is populated from the amdgpu kernel module version read via sysfs. On Strix Halo’s iGPU, sysfs returns either "1" or an empty string.
Kubernetes requires driverVersion to be valid semver (X.Y.Z). Both "1" and "" are rejected at the API level — zero ResourceSlices get registered and every ResourceClaim stays Pending indefinitely.
My fork (tamara1031/k8s-gpu-dra-driver, Apache 2.0) fixes this in two files.
First, pkg/amdgpu/amdgpu.go gets a normalizeToSemver helper applied in GetDriverVersion():
// pkg/amdgpu/amdgpu.go — Copyright (c) Advanced Micro Devices, Inc. (Apache 2.0)// normalizeToSemver pads a version string to semver 2.0.0 format (X.Y.Z).// The amdgpu kernel module on some hardware (e.g. Strix iGPU) reports "1" instead of "1.0.0",// which is rejected by the Kubernetes ResourceSlice API.func normalizeToSemver(v string) string { parts := strings.Split(v, ".") for len(parts) < 3 { parts = append(parts, "0") } return strings.Join(parts[:3], ".")}Second, cmd/gpu-kubeletplugin/deviceinfo.go adds a fallback for the empty-string case:
// cmd/gpu-kubeletplugin/deviceinfo.go — Copyright (c) Advanced Micro Devices, Inc. (Apache 2.0)func (d *AmdGpuInfo) GetDevice() resourceapi.Device { driverVersion := d.DriverVersion if driverVersion == "" { driverVersion = "0.0.0" } attributes := map[resourceapi.QualifiedName]resourceapi.DeviceAttribute{ // ... "driverVersion": {VersionValue: ptr.To(driverVersion)}, } // ...}"1" → "1.0.0" via normalizeToSemver, "" → "0.0.0" via fallback. Both are needed. Included in fork release v0.2.0.
ROCm Doesn’t Officially Support gfx1151
Both lemonade and ComfyUI need these two environment variables:
env:- name: HSA_OVERRIDE_GFX_VERSION value: "11.5.1"- name: HSA_XNACK value: "1"Without HSA_OVERRIDE_GFX_VERSION, the ROCm HIP runtime treats the GPU as an unsupported device and refuses to load.
HSA_XNACK=1 is required because of how Strix Halo works as an APU. There’s no dedicated VRAM — the CPU and GPU share the same 96 GB of LPDDR5X. On this kind of unified memory architecture, the GPU can hit page faults when accessing memory currently managed by the CPU side. XNACK lets the GPU retry those accesses gracefully instead of crashing. AMD’s MI300A has the same requirement for the same reason (it uses HBM, but the unified memory model is identical).
This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming
DRA gets the device into the container, but without these two variables it still won’t work.
The Final Stack
graph TD Client["Client"] Client -->|"OpenAI API\ntext inference"| lemonade Client -->|"ComfyUI API\nimage generation"| ComfyUI
lemonade["lemonade\nllama.cpp / ROCm"] ComfyUI["ComfyUI\nPyTorch / ROCm"]
Claim["ResourceClaim: amd-gpu"] lemonade -->|"resourceClaimName"| Claim ComfyUI -->|"resourceClaimName"| Claim
GPU["Strix Halo GPU\n96 GB unified memory\namdgpu driver"] Claim -->|"CDI inject"| GPUTwo Pods, one ResourceClaim. DRA injects device nodes like /dev/dri/renderD128 into each container via CDI (Container Device Interface) and stops there — it doesn’t manage VRAM. Concurrent access control is handled by the amdgpu driver, which natively supports multiple processes opening the same render node simultaneously.
There’s no VRAM isolation. If lemonade and ComfyUI both demand large allocations at the same time, one of them will OOM. Simultaneous access does happen — an agent can call image generation while inference is running. The reason this hasn’t caused crashes is simply the 96 GB of unified memory. About a week in, no VRAM conflict crash yet. lemonade keeps up to three models loaded (LEMONADE_MAX_LOADED_MODELS: 3), and there’s enough headroom for ComfyUI to run alongside. If your hardware has tighter VRAM, you’d need stronger controls than DRA currently provides.
Summary
| device-plugin | DRA | |
|---|---|---|
| GPU sharing | No — one Pod holds the GPU | Yes — multiple Pods via ResourceClaim |
| Kubernetes version | Works on old clusters | 1.32+ (beta) |
| Config complexity | Minimal | More moving parts |
| Single-GPU homelab | Basically unusable | Works well |
Strix Halo-specific patches needed:
- Custom NFD rule — manually register PCI ID
0x1586(as of v1.5.0) - driverVersion normalization —
"1"→"1.0.0"(in forkv0.2.0) - Env vars —
HSA_OVERRIDE_GFX_VERSION=11.5.1,HSA_XNACK=1
If you’re running Strix Halo and want this working now, ghcr.io/tamara1031/k8s-gpu-dra-driver:v0.2.0 is the fork. I’m looking at upstreaming the semver patch, but I’m not sure how many people are running Strix Halo in Kubernetes clusters — so it might be a while.
K8s DRA driver for AMD GPUs. Contribute to tamara1031/k8s-gpu-dra-driver development by creating an account on GitHub.
K8s DRA driver for AMD GPUs. Contribute to ROCm/k8s-gpu-dra-driver development by creating an account on GitHub.
Contribute to ROCm/gpu-operator development by creating an account on GitHub.









