NVIDIA vGPU on Proxmox

Install vGPU unlock Packages

Update and upgrade

apt update
apt dist-upgrade

We need to install a few more packages like git, a compiler and some other tools.

apt install -y git build-essential dkms pve-headers mdevctl

Git repos and Rust compiler

First, clone this repo to your home folder (in this case root)

git clone https://gitlab.com/polloloco/vgpu-proxmox.git

You also need the vgpu_unlock-rs repo

cd /opt
git clone https://github.com/mbilker/vgpu_unlock-rs.git

After that, install the rust compiler

curl https://sh.rustup.rs -sSf | sh -s -- -y --profile minimal

Now make the rust binaries available in your $PATH (you only have to do it the first time after installing rust)

source $HOME/.cargo/env

Enter the vgpu_unlock-rs directory and compile the library. Depending on your hardware and internet connection that may take a while

cd vgpu_unlock-rs/
cargo build --release

Create files for vGPU unlock

The vgpu_unlock-rs library requires a few files and folders in order to work properly, lets create those First create the folder for your vgpu unlock config and create an empty config file

mkdir /etc/vgpu_unlock
touch /etc/vgpu_unlock/profile_override.toml

Then, create folders and files for systemd to load the vgpu_unlock-rs library when starting the nvidia vgpu services

mkdir /etc/systemd/system/{nvidia-vgpud.service.d,nvidia-vgpu-mgr.service.d}
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf

Enabling IOMMU

To enable IOMMU you have to enable it in your BIOS/UEFI first. Due to it being vendor specific, I am unable to provide instructions for that, but usually for Intel systems the option you are looking for is called something like “Vt-d”, AMD systems tend to call it “IOMMU”. After enabling it in your BIOS/UEFI, you also have to enable it in your kernel. Depending on how your system is booting, there are two ways to do that. If you installed your system with ZFS-on-root and in UEFI mode, then you are using systemd-boot, everything else is GRUB. GRUB is way more common so if you are unsure, you are probably using that. Depending on which system you are using to boot, you have to chose from the following two options:

GRUB

Open the file /etc/default/grub in your favorite editor

vi /etc/default/grub

The kernel parameters have to be appended to the variable GRUB_CMDLINE_LINUX_DEFAULT. On a clean installation that line should look like this

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

If you are using an Intel system, append this after quiet:

intel_iommu=on iommu=pt

On AMD systems, append this after quiet:

amd_iommu=on iommu=pt

The result should look like this (for intel systems):

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"

Now, save and exit from the editor then apply your changes:

update-grub

systemd-boot

Loading required kernel modules and blacklisting the open source nvidia driver

We have to load the vfio, vfio_iommu_type1, vfio_pci and vfio_virqfd kernel modules to get vGPU working

echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules

Proxmox comes with the open source nouveau driver for nvidia gpus, however we have to use our patched nvidia driver to enable vGPU. The next line will prevent the nouveau driver from loading

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Applying our kernel configuration

I’m not sure if this is needed, but it doesn’t hurt :)

update-initramfs -u -k all

…and reboot

reboot

Check if IOMMU is enabled

Note: See section “Enabling IOMMU”, this is optional Wait for your server to restart, then type this into a root shell

dmesg | grep -e DMAR -e IOMMU

Depending on your mainboard and cpu, the output will be different, in my output the important line is the third one: DMAR: IOMMU enabled. If you see something like that, IOMMU is enabled.

NVIDIA vGPU Driver

This repo contains patches that allow you to use vGPU on not-qualified-vGPU cards (consumer GPUs). Those patches are binary patches, which means that each patch works ONLY for a specific driver version. I’ve created patches for the following driver versions:

16.2 (535.129.03) - Use this if you are on pve 8.0 (kernel 6.2, 6.5 should work too) 16.1 (535.104.06) 16.0 (535.54.06) 15.1 (525.85.07) 15.0 (525.60.12) 14.4 (510.108.03) 14.3 (510.108.03) 14.2 (510.85.03)

You can choose which of those you want to use, but generally its recommended to use the latest, most up-to-date version (16.2 in this case). If you have a vGPU qualified GPU, you can use other versions too, because you don’t need to patch the driver. However, you still have to make sure they are compatible with your proxmox version and kernel. Also I would not recommend using any older versions unless you have a very specific requirement.

Obtaining the driver

NVIDIA doesn’t let you freely download vGPU drivers like they do with GeForce or normal Quadro drivers, instead you have to download them through the NVIDIA Licensing Portal (see: https://www.nvidia.com/en-us/drivers/vgpu-software-driver/). You can sign up for a free evaluation to get access to the download page. NB: When applying for an eval license, do NOT use your personal email or other email at a free email provider like gmail.com. You will probably have to go through manual review if you use such emails. I have very good experience using a custom domain for my email address, that way the automatic verification usually lets me in after about five minutes.

Software Download → Product Family: vGPU → Platform (Linux KVM) → Platform Version (All Support) → Product Version (16.2)

After downloading, extract the zip file

unzip

and then copy the file called NVIDIA-Linux-x86_64-DRIVERVERSION-vgpu-kvm.run (where DRIVERVERSION is a string like 535.129.03) from the Host_Drivers folder to your Proxmox host into the root folder using tools like FileZilla, WinSCP, scp or rsync.

Patching the driver

Now, on the proxmox host, make the driver executable

chmod +x NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run

And then patch it

./NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run --apply-patch ~/vgpu-proxmox/535.129.03.patch

That should output a lot of lines ending with

Self-extractible archive "NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run" successfully created.

You should now have a file called NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run, that is your patched driver.

Installing the driver

Now that the required patch is applied, you can install the driver

./NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run --dkms

The installer will ask you Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later., answer with Yes.

Depending on your hardware, the installation could take a minute or two. If everything went right, you will be presented with this message.

Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 535.129.03) is now complete.

Click Ok to exit the installer. To finish the installation, reboot.

reboot

Finishing touches

Wait for your server to reboot, then type this into the shell to check if the driver install worked

root@server4:~# nvidia-smi
Tue Dec 19 15:34:40 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: N/A      |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070        On  | 00000000:01:00.0 Off |                  N/A |
| 31%   33C    P8              18W / 175W |     60MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
 
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

To verify if the vGPU unlock worked, type this command

root@server4:~# mdevctl types
0000:01:00.0
  nvidia-256
    Available instances: 24
    Device API: vfio-pci
    Name: GRID RTX6000-1Q
    Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-257
    Available instances: 12
    Device API: vfio-pci
    Name: GRID RTX6000-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
  nvidia-258
    Available instances: 8
    Device API: vfio-pci
    Name: GRID RTX6000-3Q
    Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
  nvidia-259
    Available instances: 6
    Device API: vfio-pci
    Name: GRID RTX6000-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
  nvidia-260
    Available instances: 4
    Device API: vfio-pci
    Name: GRID RTX6000-6Q
    Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
  nvidia-261
    Available instances: 3
    Device API: vfio-pci
    Name: GRID RTX6000-8Q
    Description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3
  nvidia-262
    Available instances: 2
    Device API: vfio-pci
    Name: GRID RTX6000-12Q
    Description: num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2
  nvidia-263
    Available instances: 1
    Device API: vfio-pci
    Name: GRID RTX6000-24Q
    Description: num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1
  nvidia-435
    Available instances: 24
    Device API: vfio-pci
    Name: GRID RTX6000-1B
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-436
    Available instances: 12
    Device API: vfio-pci
    Name: GRID RTX6000-2B
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
  nvidia-437
    Available instances: 24
    Device API: vfio-pci
    Name: GRID RTX6000-1A
    Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24
  nvidia-438
    Available instances: 12
    Device API: vfio-pci
    Name: GRID RTX6000-2A
    Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12
  nvidia-439
    Available instances: 8
    Device API: vfio-pci
    Name: GRID RTX6000-3A
    Description: num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8
  nvidia-440
    Available instances: 6
    Device API: vfio-pci
    Name: GRID RTX6000-4A
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6
  nvidia-441
    Available instances: 4
    Device API: vfio-pci
    Name: GRID RTX6000-6A
    Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4
  nvidia-442
    Available instances: 3
    Device API: vfio-pci
    Name: GRID RTX6000-8A
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3
  nvidia-443
    Available instances: 2
    Device API: vfio-pci
    Name: GRID RTX6000-12A
    Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2
  nvidia-444
    Available instances: 1
    Device API: vfio-pci
    Name: GRID RTX6000-24A
    Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1

If this command doesn’t return any output, vGPU unlock isn’t working. Another command you can try to see if your card is recognized as being vgpu enabled is this one: If everything worked right with the unlock, the output should be similar to this:

root@server4:~# nvidia-smi vgpu
Tue Dec 19 15:39:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03                |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  NVIDIA GeForce RTX 2070    | 00000000:01:00.0             |   0%       |
+---------------------------------+------------------------------+------------+

However, if you get this output, then something went wrong

No supported devices in vGPU mode

vGPU overrides/settings

Further up we have created the file /etc/vgpu_unlock/profile_override.toml and I didn’t explain what it was for yet. Using that file you can override lots of parameters for your vGPU instances: For example you can change the maximum resolution, enable/disable the frame rate limiter, enable/disable support for CUDA or change the vram size of your virtual gpus.

If we take a look at the output of mdevctl types we see lots of different types that we can choose from. However, if we for example chose GRID RTX6000-4Q which gives us 4GB of vram in a VM, we are locked to that type for all of our VMs. Meaning we can only have 4GB VMs, its not possible to mix different types to have one 4GB VM, and two 2GB VMs.

Important notes Q profiles can give you horrible performance in OpenGL applications/games. To fix that, switch to an equivalent A or B profile (for example GRID RTX6000-4B) C profiles (for example GRID RTX6000-4C) only work on Linux, don’t try using those on Windows, it will not work - at all. A profiles (for example GRID RTX6000-4A) will NOT work on Linux, they only work on Windows.

All of that changes with the override config file. Technically we are still locked to only using one profile, but now its possible to change the vram of the profile on a VM basis so even though we have three GRID RTX6000-4Q instances, one VM can have 4GB or vram but we can override the vram size for the other two VMs to only 2GB. Lets take a look at this example config override file (its in TOML format)

[profile.nvidia-259]
num_displays = 1          # Max number of virtual displays. Usually 1 if you want a simple remote gaming VM
display_width = 1920      # Maximum display width in the VM
display_height = 1080     # Maximum display height in the VM
max_pixels = 2073600      # This is the product of display_width and display_height so 1920 * 1080 = 2073600
cuda_enabled = 1          # Enables CUDA support. Either 1 or 0 for enabled/disabled
frl_enabled = 1           # This controls the frame rate limiter, if you enable it your fps in the VM get locked to 60fps. Either 1 or 0 for enabled/disabled
framebuffer = 0x74000000
framebuffer_reservation = 0xC000000   # In combination with the framebuffer size
                                      # above, these two lines will give you a VM
                                      # with 2GB of VRAM (framebuffer + framebuffer_reservation = VRAM size in bytes).
                                      # See below for some other sizes
 
[vm.100]
frl_enabled = 0
# You can override all the options from above here too. If you want to add more overrides for a new VM, just copy this block and change the VM ID

There are two blocks here, the first being [profile.nvidia-259] and the second [vm.100]. The first one applies the overrides to all VM instances of the nvidia-259 type (thats GRID RTX6000-4Q) and the second one applies its overrides only to one specific VM, that one with the proxmox VM ID 100. The proxmox VM ID is the same number that you see in the proxmox webinterface, next to the VM name. You don’t have to specify all parameters, only the ones you need/want. There are some more that I didn’t mention here, you can find them by going through the source code of the vgpu_unlock-rs repo. For a simple 1080p remote gaming VM I recommend going with something like this

[profile.nvidia-259] # choose the profile you want here
num_displays = 1
display_width = 1920
display_height = 1080
max_pixels = 2073600

Common VRAM sizes

Here are some common framebuffer sizes that you might want to use:

512MB

framebuffer = 0x1A000000
framebuffer_reservation = 0x6000000

1GB

framebuffer = 0x38000000
framebuffer_reservation = 0x8000000

2GB

framebuffer = 0x74000000
framebuffer_reservation = 0xC000000

3GB

framebuffer = 0xB0000000
framebuffer_reservation = 0x10000000

4GB

framebuffer = 0xEC000000
framebuffer_reservation = 0x14000000

5GB

framebuffer = 0x128000000
framebuffer_reservation = 0x18000000

6GB

framebuffer = 0x164000000
framebuffer_reservation = 0x1C000000

8GB

framebuffer = 0x1DC000000
framebuffer_reservation = 0x24000000

10GB

framebuffer = 0x254000000
framebuffer_reservation = 0x2C000000

12GB

framebuffer = 0x2CC000000
framebuffer_reservation = 0x34000000

16GB

framebuffer = 0x3BC000000
framebuffer_reservation = 0x44000000

20GB

framebuffer = 0x4AC000000
framebuffer_reservation = 0x54000000

24GB

framebuffer = 0x59C000000
framebuffer_reservation = 0x64000000

32GB

framebuffer = 0x77C000000
framebuffer_reservation = 0x84000000

48GB

framebuffer = 0xB2D200000
framebuffer_reservation = 0xD2E00000

framebuffer and framebuffer_reservation will always equal the VRAM size in bytes when added together.

Adding a vGPU to a Proxmox VM

Go to the proxmox webinterface, go to your VM, then to Hardware, then to Add and select PCI Device. You should be able to choose from a list of pci devices. Choose your GPU there, its entry should say Yes in the Mediated Devices column. Now you should be able to also select the MDev Type. Choose whatever profile you want, if you don’t remember which one you want, you can see the list of all available types with mdevctl types.

Finish by clicking Add, start the VM and install the required drivers. scp

  cd /home/yanboyang713/Downloads/vGPU/Guest_Drivers
scp NVIDIA-Linux-x86_64-535.129.03-grid.run yanboyang713@192.168.88.234:/home/yanboyang713
  scp nvidia-linux-grid-535_535.129.03_amd64.deb yanboyang713@192.168.88.255:/home/yanboyang713
  cd /home/yanboyang713/Downloads/vGPU/Signing_Keys
  scp vgpu_debian_publickey.pub yanboyang713@192.168.88.234:/home/yanboyang713
 
  sudo apt install ./nvidia-linux-grid-535_535.129.03_amd64.deb
sudo apt --fix-broken install
 
 
sudo apt install build-essential
sudo bash NVIDIA-Linux-x86_64-535.129.03-grid.run
 
nvidia-smi

After installing the drivers you can shut the VM down and remove the virtual display adapter by selecting Display in the Hardware section and selecting none (none). ONLY do that if you have some other way to access the Virtual Machine like Parsec or Remote Desktop because the Proxmox Console won’t work anymore. Enjoy your new vGPU VM :)

the best compatibility is reached when using q35 as machine type, must be SeaBIOS, without secure boot and PCIe instead of PCI.

https://cloud.google.com/compute/docs/gpus/install-drivers-gpu?hl=zh-cn

vGPU Licensing

Usually a license is required to use vGPU

The recommended way to get around the license is to set up your own license server. Follow the instructions here (or here if the other link is down).

【淘宝】https://m.tb.cn/h.5MjTOzd?tk=SuTkWgWXZdn CZ3457 「NVIDIA GRID vGPU驱动下载、安装服务授权许可专业vGPU技术服务」点击链接直接打开或者淘宝搜索直接打开

Boyang Yan

Explorer