Tuesday, September 16, 2025

Fixing NVIDIA driver & kernel module mismatch on Proxmox over Debian 12 virtual machine

System:

  • OS: Debian 12 (Bookworm) / Proxmox VE 8.x (kernel series 6.1.0-xx-amd64) 
  • GPU: NVIDIA (GH100, A100, etc.)
  • Driver: NVIDIA 580.82.07 (from CUDA repo)

1. Description of the Issue

On Linux systems with NVIDIA GPUs, it is common to hit a driver mismatch problem after kernel updates.

  • The Linux kernel is updated (e.g., from 6.1.0-37 → 6.1.0-39).
  • The NVIDIA kernel module (nvidia.ko) is not rebuilt for the new kernel.
  • The user-space components (nvidia-smi, CUDA libraries, NVML) remain at the latest installed version.
  • Result: the system cannot load the NVIDIA driver, producing errors like “Driver/library version mismatch” or “Module not found”.

This manual provides a systematic solution to:

  • Detect when such a mismatch occurs.
  • Rebuild the NVIDIA driver kernel module with DKMS for the currently running kernel.
  • Verify that both the kernel module and userspace libraries are aligned.

2. Symptom Examples

After a kernel upgrade and reboot: (Or after missing the reboot)

root@james:~# nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.

root@james:~# modinfo nvidia | awk '/^version:/ {print $2}'

modinfo: ERROR: Module nvidia not found.


root@james:~# cat /proc/driver/nvidia/version

cat: /proc/driver/nvidia/version: No such file or directory

dkms status only shows modules for older kernels:

nvidia/580.82.07, 6.1.0-32-amd64, x86_64: installed

nvidia/580.82.07, 6.1.0-37-amd64, x86_64: installed

But the system is running:

root@james:~# uname -r

6.1.0-39-amd64


3. Root Cause

The NVIDIA kernel driver was not rebuilt for the new kernel (6.1.0-39-amd64), so the OS has no compatible module to load. Userspace is at 580.82.07, but the kernel has no nvidia.ko for the new version.

4. Solution

Step 1. Install kernel headers and DKMS tools

apt-get update

apt-get install -y linux-headers-$(uname -r) build-essential dkms

Fix: enable Debian’s standard repos (main contrib non-free-firmware) and retry.

Step 2. Ensure NVIDIA DKMS package is present

Step 3. Rebuild for the current kernel

If build fails, inspect logs:

Step 4. Load the driver

depmod -a

modprobe nvidia

modprobe nvidia_uvm

modprobe nvidia_modeset

modprobe nvidia_drm

Error before rebuild:

modprobe: FATAL: Module nvidia not found in directory /lib/modules/6.1.0-39-amd64


Step 5. Verify

modinfo nvidia | awk '/^version:/ {print $2}'

# Expected: 580.82.07

 

cat /proc/driver/nvidia/version

# Expected: shows NVRM version 580.82.07

 

nvidia-smi

# Expected: normal GPU list, no mismatch errors


5. Expected Outcome

root@james:~# modinfo nvidia | awk '/^version:/ {print $2}'

580.82.07

 

root@james:~# cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.82.07  Release Build

GCC version: gcc version 12.2.0 (Debian 12.2.0-14+deb12u1)

 

root@james:~# nvidia-smi

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 580.82.07    Driver Version: 580.82.07    CUDA Version: 12.8     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

...


This manual ensures that after every kernel update, the NVIDIA driver stack is rebuilt and aligned, preventing downtime on Proxmox/Debian systems with GPUs.

#NVIDIA #Driver #Kernel #Module #Mismatch #nvidiaSmi #DriverLibraryMismatch #Linux #Debian12 #ProxmoxVE8 #DKMS #LinuxHeaders #CUDA #nvidiaPersistenced #GPU #KernelUpdate #nvidiaKoNotFound #NVML #nouveau #Blacklist #CUDARepository




No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Popular Posts