NVIDIA GPU + Docker on Arch

1. Verify NVIDIA driver (host level)

Run:

nvidia-smi

Expected: GPU info (RTX 3060, driver version, CUDA version)

Why this matters

Docker does not provide GPU drivers—it only passes them through. If this fails, nothing else will work.

2. Fix package mirror issues (Arch-specific)

You encountered:

404 error from stable-mirror.omarchy.org

Fix used:

sudo pacman -Syy

What this does

Forces a full refresh of package databases
Syncs your system with updated mirrors
Fixes "package not found / 404" issues caused by stale mirrors

Optional (better long-term fix)

sudo pacman -S reflector
sudo reflector --latest 10 --sort rate --save /etc/pacman.d/mirrorlist

3. Install NVIDIA Container Toolkit

sudo pacman -S nvidia-container-toolkit

What this installs

libnvidia-container → low-level GPU integration
nvidia-container-runtime → Docker runtime for GPUs
nvidia-ctk → configuration tool

Why this is required

Docker does not natively understand GPUs. This toolkit bridges:

Docker → NVIDIA driver → GPU

4. Configure Docker to use NVIDIA runtime

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

What this does

Updates Docker config (/etc/docker/daemon.json)
Registers NVIDIA as a valid runtime
Enables --gpus all flag to work

5. (Modern Docker fix) CDI configuration

You initially hit:

failed to discover GPU vendor from CDI

This happens because newer Docker uses CDI (Container Device Interface).

Fix:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
sudo systemctl restart docker

What this does

Creates a device mapping spec for GPUs
Lets Docker discover GPUs in a standardized way

6. Test GPU inside Docker

docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

Expected result

You should see the same GPU output as on host.

7. Run your original container

sudo docker run --privileged --net=host --ipc=host \
  --ulimit memlock=-1:-1 --ulimit stack=-1:-1 \
  --gpus all --rm -it \
  nvcr.io/nvidia/pytorch:25.12-py3

8. Key concepts

GPU passthrough chain

PyTorch (container)
   ↓
CUDA (container)
   ↓
NVIDIA Container Toolkit
   ↓
Host NVIDIA Driver
   ↓
RTX 3060

Why CDI matters (new Docker behavior)

Old way: --runtime=nvidia

New way: --gpus all (uses CDI internally)

If CDI config is missing → you get the error you saw.

9. Minimal checklist (future quick setup)

# 1. Fix mirrors
sudo pacman -Syy

# 2. Install toolkit
sudo pacman -S nvidia-container-toolkit

# 3. Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# 4. Enable CDI
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# 5. Restart Docker
sudo systemctl restart docker

# 6. Test
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi

10. Common failure points

nvidia-smi fails → driver issue
Docker shows no runtimes → toolkit not configured
CDI error → missing /etc/cdi/nvidia.yaml
Permission issues → Docker not restarted

1. Verify NVIDIA driver (host level)​

Why this matters​

2. Fix package mirror issues (Arch-specific)​

Fix used:​

What this does​

Optional (better long-term fix)​

3. Install NVIDIA Container Toolkit​

What this installs​

Why this is required​

4. Configure Docker to use NVIDIA runtime​

What this does​

5. (Modern Docker fix) CDI configuration​

Fix:​

What this does​

6. Test GPU inside Docker​

Expected result​

7. Run your original container​

8. Key concepts​

GPU passthrough chain​

Why CDI matters (new Docker behavior)​

9. Minimal checklist (future quick setup)​

10. Common failure points​

1. Verify NVIDIA driver (host level)

Why this matters

2. Fix package mirror issues (Arch-specific)

Fix used:

What this does

Optional (better long-term fix)

3. Install NVIDIA Container Toolkit

What this installs

Why this is required

4. Configure Docker to use NVIDIA runtime

What this does

5. (Modern Docker fix) CDI configuration

Fix:

What this does

6. Test GPU inside Docker

Expected result

7. Run your original container

8. Key concepts

GPU passthrough chain

Why CDI matters (new Docker behavior)

9. Minimal checklist (future quick setup)

10. Common failure points