Sharing One AMD GPU Across Pods with DRA: Strix Halo Patches

The last post ended with me bailing on VFIO and running k3s directly on bare-metal Ubuntu 26.04. The GPU dirty-state-after-crash problem was unrecoverable inside a VM, so I just moved everything to bare metal. With that out of the way, I could finally deal with the GPU management layer itself.

Starting Point: “Why Not Just Use device-plugin?”

The usual way to expose AMD GPUs in Kubernetes is the device-plugin. The configuration is minimal: request amd.com/gpu: "1" in your Pod spec and you’re done.

resources:
  limits:
    amd.com/gpu: "1"

The problem is that the GPU becomes exclusive to that Pod. In a homelab with a single GPU, everything else stays Pending.

In my setup, lemonade — a llama.cpp-based inference server — and ComfyUI for image generation both need GPU access. An agent can trigger image generation mid-inference, so simultaneous GPU access does happen. With device-plugin, only one can run at a time.

Why DRA

DRA (Dynamic Resource Allocation) landed as alpha in Kubernetes 1.26 and hit beta in 1.32. Instead of integer resource counting, it models resources as ResourceClaim objects — and multiple Pods can reference the same claim.

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: amd-gpu
spec:
  devices:
    requests:
    - name: gpu
      exactly:
        deviceClassName: gpu.amd.com
        allocationMode: ExactCount
        count: 1

resource.k8s.io/v1 requires Kubernetes 1.34+. On older k3s releases, you’ll need v1beta1 or v1beta2 instead.

Pods reference it via resourceClaimName:

spec:
  resourceClaims:
  - name: gpu
    resourceClaimName: amd-gpu
  containers:
  - name: lemonade
    resources:
      claims:
      - name: gpu

Both lemonade and ComfyUI reference the same amd-gpu ResourceClaim. Kubernetes wires up device access; VRAM management is left entirely to each app. There’s no isolation — it works because 96 GB of unified memory gives enough headroom even when both are running simultaneously.

Switching in gpu-operator is two lines in values.yaml:

deviceConfig:
  spec:
    devicePlugin:
      enableDevicePlugin: false
    draDriver:
      enable: true
      image: ghcr.io/tamara1031/k8s-gpu-dra-driver:v0.2.0

That image is my fork, not upstream. Here’s why.

Three Strix Halo Gotchas

NFD Doesn’t Detect the GPU

gpu-operator v1.5.0’s Node Feature Discovery rules don’t include the Strix Halo PCI device ID 0x1586. Without that label, the DRA driver DaemonSet never schedules. The GPU is physically present but Kubernetes treats it as nonexistent.

I added a NodeFeatureRule to the Helm chart:

apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: amd-gpu-strix-halo
  namespace: kube-amd-gpu
spec:
  rules:
  - name: amd-gpu-strix-halo
    labels:
      feature.node.kubernetes.io/amd-gpu: "true"
    matchAny:
    - matchFeatures:
      - feature: pci.device
        matchExpressions:
          device:
            op: In
            value:
            - "1586"  # Strix Halo (gfx1151) — not in gpu-operator v1.5.0
          vendor:
            op: In
            value:
            - "1002"

Check the latest release notes before adding this — it may have been included upstream since v1.5.0.

driverVersion Fails Semver Validation

The DRA driver writes GPU metadata to a ResourceSlice and publishes it to the kubelet. The driverVersion field is populated from the amdgpu kernel module version read via sysfs. On Strix Halo’s iGPU, sysfs returns either "1" or an empty string.

Kubernetes requires driverVersion to be valid semver (X.Y.Z). Both "1" and "" are rejected at the API level — zero ResourceSlices get registered and every ResourceClaim stays Pending indefinitely.

My fork (tamara1031/k8s-gpu-dra-driver, Apache 2.0) fixes this in two files.

First, pkg/amdgpu/amdgpu.go gets a normalizeToSemver helper applied in GetDriverVersion():

// pkg/amdgpu/amdgpu.go — Copyright (c) Advanced Micro Devices, Inc. (Apache 2.0)
// normalizeToSemver pads a version string to semver 2.0.0 format (X.Y.Z).
// The amdgpu kernel module on some hardware (e.g. Strix iGPU) reports "1" instead of "1.0.0",
// which is rejected by the Kubernetes ResourceSlice API.
func normalizeToSemver(v string) string {
    parts := strings.Split(v, ".")
    for len(parts) < 3 {
        parts = append(parts, "0")
    }
    return strings.Join(parts[:3], ".")
}

Second, cmd/gpu-kubeletplugin/deviceinfo.go adds a fallback for the empty-string case:

// cmd/gpu-kubeletplugin/deviceinfo.go — Copyright (c) Advanced Micro Devices, Inc. (Apache 2.0)
func (d *AmdGpuInfo) GetDevice() resourceapi.Device {
    driverVersion := d.DriverVersion
    if driverVersion == "" {
        driverVersion = "0.0.0"
    }
    attributes := map[resourceapi.QualifiedName]resourceapi.DeviceAttribute{
        // ...
        "driverVersion": {VersionValue: ptr.To(driverVersion)},
    }
    // ...
}

"1" → "1.0.0" via normalizeToSemver, "" → "0.0.0" via fallback. Both are needed. Included in fork release v0.2.0.

ROCm Doesn’t Officially Support gfx1151

Both lemonade and ComfyUI need these two environment variables:

env:
- name: HSA_OVERRIDE_GFX_VERSION
  value: "11.5.1"
- name: HSA_XNACK
  value: "1"

Without HSA_OVERRIDE_GFX_VERSION, the ROCm HIP runtime treats the GPU as an unsupported device and refuses to load.

HSA_XNACK=1 is required because of how Strix Halo works as an APU. There’s no dedicated VRAM — the CPU and GPU share the same 96 GB of LPDDR5X. On this kind of unified memory architecture, the GPU can hit page faults when accessing memory currently managed by the CPU side. XNACK lets the GPU retry those accesses gracefully instead of crashing. AMD’s MI300A has the same requirement for the same reason (it uses HBM, but the unified memory model is identical).

MI300A - Exploring the APU advantage

This blog post introduces the MI300 APU hardware, how it differs from other discrete systems, and how to leverage its GPU programming

rocm.blogs.amd.com

DRA gets the device into the container, but without these two variables it still won’t work.

The Final Stack

Two Pods, one ResourceClaim. DRA injects device nodes like /dev/dri/renderD128 into each container via CDI (Container Device Interface) and stops there — it doesn’t manage VRAM. Concurrent access control is handled by the amdgpu driver, which natively supports multiple processes opening the same render node simultaneously.

There’s no VRAM isolation. If lemonade and ComfyUI both demand large allocations at the same time, one of them will OOM. Simultaneous access does happen — an agent can call image generation while inference is running. The reason this hasn’t caused crashes is simply the 96 GB of unified memory. About a week in, no VRAM conflict crash yet. lemonade keeps up to three models loaded (LEMONADE_MAX_LOADED_MODELS: 3), and there’s enough headroom for ComfyUI to run alongside. If your hardware has tighter VRAM, you’d need stronger controls than DRA currently provides.

Summary

	device-plugin	DRA
GPU sharing	No — one Pod holds the GPU	Yes — multiple Pods via ResourceClaim
Kubernetes version	Works on old clusters	1.32+ (beta)
Config complexity	Minimal	More moving parts
Single-GPU homelab	Basically unusable	Works well

Strix Halo-specific patches needed:

Custom NFD rule — manually register PCI ID 0x1586 (as of v1.5.0)
driverVersion normalization — "1" → "1.0.0" (in fork v0.2.0)
Env vars — HSA_OVERRIDE_GFX_VERSION=11.5.1, HSA_XNACK=1

If you’re running Strix Halo and want this working now, ghcr.io/tamara1031/k8s-gpu-dra-driver:v0.2.0 is the fork. I’m looking at upstreaming the semver patch, but I’m not sure how many people are running Strix Halo in Kubernetes clusters — so it might be a while.