Install vGPU unlock Packages
Update and upgrade
apt update
apt dist-upgrade
We need to install a few more packages like git, a compiler and some other tools.
apt install -y git build-essential dkms pve-headers mdevctl
Git repos and Rust compiler
First, clone this repo to your home folder (in this case root)
git clone https://gitlab.com/polloloco/vgpu-proxmox.git
You also need the vgpu_unlock-rs repo
cd /opt
git clone https://github.com/mbilker/vgpu_unlock-rs.git
After that, install the rust compiler
curl https://sh.rustup.rs -sSf | sh -s -- -y --profile minimal
Now make the rust binaries available in your $PATH (you only have to do it the first time after installing rust)
source $HOME/.cargo/env
Enter the vgpu_unlock-rs directory and compile the library. Depending on your hardware and internet connection that may take a while
cd vgpu_unlock-rs/
cargo build --release
Create files for vGPU unlock
The vgpu_unlock-rs library requires a few files and folders in order to work properly, lets create those First create the folder for your vgpu unlock config and create an empty config file
mkdir /etc/vgpu_unlock
touch /etc/vgpu_unlock/profile_override.toml
Then, create folders and files for systemd to load the vgpu_unlock-rs library when starting the nvidia vgpu services
mkdir /etc/systemd/system/{nvidia-vgpud.service.d,nvidia-vgpu-mgr.service.d}
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf
Enabling IOMMU
To enable IOMMU you have to enable it in your BIOS/UEFI first. Due to it being vendor specific, I am unable to provide instructions for that, but usually for Intel systems the option you are looking for is called something like “Vt-d”, AMD systems tend to call it “IOMMU”. After enabling it in your BIOS/UEFI, you also have to enable it in your kernel. Depending on how your system is booting, there are two ways to do that. If you installed your system with ZFS-on-root and in UEFI mode, then you are using systemd-boot, everything else is GRUB. GRUB is way more common so if you are unsure, you are probably using that. Depending on which system you are using to boot, you have to chose from the following two options:
GRUB
Open the file /etc/default/grub in your favorite editor
vi /etc/default/grub
The kernel parameters have to be appended to the variable GRUB_CMDLINE_LINUX_DEFAULT. On a clean installation that line should look like this
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
If you are using an Intel system, append this after quiet:
intel_iommu=on iommu=pt
On AMD systems, append this after quiet:
amd_iommu=on iommu=pt
The result should look like this (for intel systems):
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
Now, save and exit from the editor then apply your changes:
update-grub
systemd-boot
Loading required kernel modules and blacklisting the open source nvidia driver
We have to load the vfio, vfio_iommu_type1, vfio_pci and vfio_virqfd kernel modules to get vGPU working
echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules
Proxmox comes with the open source nouveau driver for nvidia gpus, however we have to use our patched nvidia driver to enable vGPU. The next line will prevent the nouveau driver from loading
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
Applying our kernel configuration
I’m not sure if this is needed, but it doesn’t hurt :)
update-initramfs -u -k all
…and reboot
reboot
Check if IOMMU is enabled
Note: See section “Enabling IOMMU”, this is optional Wait for your server to restart, then type this into a root shell
dmesg | grep -e DMAR -e IOMMU
Depending on your mainboard and cpu, the output will be different, in my output the important line is the third one: DMAR: IOMMU enabled. If you see something like that, IOMMU is enabled.
NVIDIA vGPU Driver
This repo contains patches that allow you to use vGPU on not-qualified-vGPU cards (consumer GPUs). Those patches are binary patches, which means that each patch works ONLY for a specific driver version. I’ve created patches for the following driver versions:
16.2 (535.129.03) - Use this if you are on pve 8.0 (kernel 6.2, 6.5 should work too) 16.1 (535.104.06) 16.0 (535.54.06) 15.1 (525.85.07) 15.0 (525.60.12) 14.4 (510.108.03) 14.3 (510.108.03) 14.2 (510.85.03)
You can choose which of those you want to use, but generally its recommended to use the latest, most up-to-date version (16.2 in this case). If you have a vGPU qualified GPU, you can use other versions too, because you don’t need to patch the driver. However, you still have to make sure they are compatible with your proxmox version and kernel. Also I would not recommend using any older versions unless you have a very specific requirement.
Obtaining the driver
NVIDIA doesn’t let you freely download vGPU drivers like they do with GeForce or normal Quadro drivers, instead you have to download them through the NVIDIA Licensing Portal (see: https://www.nvidia.com/en-us/drivers/vgpu-software-driver/). You can sign up for a free evaluation to get access to the download page. NB: When applying for an eval license, do NOT use your personal email or other email at a free email provider like gmail.com. You will probably have to go through manual review if you use such emails. I have very good experience using a custom domain for my email address, that way the automatic verification usually lets me in after about five minutes.
Software Download → Product Family: vGPU → Platform (Linux KVM) → Platform Version (All Support) → Product Version (16.2)
After downloading, extract the zip file
and then copy the file called NVIDIA-Linux-x86_64-DRIVERVERSION-vgpu-kvm.run (where DRIVERVERSION is a string like 535.129.03) from the Host_Drivers folder to your Proxmox host into the root folder using tools like FileZilla, WinSCP, scp or rsync.
Patching the driver
Now, on the proxmox host, make the driver executable
chmod +x NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run
And then patch it
./NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm.run --apply-patch ~/vgpu-proxmox/535.129.03.patch
That should output a lot of lines ending with
Self-extractible archive "NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run" successfully created.
You should now have a file called NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run, that is your patched driver.
Installing the driver
Now that the required patch is applied, you can install the driver
./NVIDIA-Linux-x86_64-535.129.03-vgpu-kvm-custom.run --dkms
The installer will ask you Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later., answer with Yes.
Depending on your hardware, the installation could take a minute or two. If everything went right, you will be presented with this message.
Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version: 535.129.03) is now complete.
Click Ok to exit the installer. To finish the installation, reboot.
reboot
Finishing touches
Wait for your server to reboot, then type this into the shell to check if the driver install worked
root@server4:~# nvidia-smi
Tue Dec 19 15:34:40 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: N/A |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 On | 00000000:01:00.0 Off | N/A |
| 31% 33C P8 18W / 175W | 60MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
To verify if the vGPU unlock worked, type this command
root@server4:~# mdevctl types
0000:01:00.0
nvidia-256
Available instances: 24
Device API: vfio-pci
Name: GRID RTX6000-1Q
Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-257
Available instances: 12
Device API: vfio-pci
Name: GRID RTX6000-2Q
Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
nvidia-258
Available instances: 8
Device API: vfio-pci
Name: GRID RTX6000-3Q
Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
nvidia-259
Available instances: 6
Device API: vfio-pci
Name: GRID RTX6000-4Q
Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
nvidia-260
Available instances: 4
Device API: vfio-pci
Name: GRID RTX6000-6Q
Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
nvidia-261
Available instances: 3
Device API: vfio-pci
Name: GRID RTX6000-8Q
Description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3
nvidia-262
Available instances: 2
Device API: vfio-pci
Name: GRID RTX6000-12Q
Description: num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2
nvidia-263
Available instances: 1
Device API: vfio-pci
Name: GRID RTX6000-24Q
Description: num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1
nvidia-435
Available instances: 24
Device API: vfio-pci
Name: GRID RTX6000-1B
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-436
Available instances: 12
Device API: vfio-pci
Name: GRID RTX6000-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-437
Available instances: 24
Device API: vfio-pci
Name: GRID RTX6000-1A
Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24
nvidia-438
Available instances: 12
Device API: vfio-pci
Name: GRID RTX6000-2A
Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12
nvidia-439
Available instances: 8
Device API: vfio-pci
Name: GRID RTX6000-3A
Description: num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8
nvidia-440
Available instances: 6
Device API: vfio-pci
Name: GRID RTX6000-4A
Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6
nvidia-441
Available instances: 4
Device API: vfio-pci
Name: GRID RTX6000-6A
Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4
nvidia-442
Available instances: 3
Device API: vfio-pci
Name: GRID RTX6000-8A
Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3
nvidia-443
Available instances: 2
Device API: vfio-pci
Name: GRID RTX6000-12A
Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2
nvidia-444
Available instances: 1
Device API: vfio-pci
Name: GRID RTX6000-24A
Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1
If this command doesn’t return any output, vGPU unlock isn’t working. Another command you can try to see if your card is recognized as being vgpu enabled is this one: If everything worked right with the unlock, the output should be similar to this:
root@server4:~# nvidia-smi vgpu
Tue Dec 19 15:39:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA GeForce RTX 2070 | 00000000:01:00.0 | 0% |
+---------------------------------+------------------------------+------------+
However, if you get this output, then something went wrong
No supported devices in vGPU mode
vGPU overrides/settings
Further up we have created the file /etc/vgpu_unlock/profile_override.toml and I didn’t explain what it was for yet. Using that file you can override lots of parameters for your vGPU instances: For example you can change the maximum resolution, enable/disable the frame rate limiter, enable/disable support for CUDA or change the vram size of your virtual gpus.
If we take a look at the output of mdevctl types we see lots of different types that we can choose from. However, if we for example chose GRID RTX6000-4Q which gives us 4GB of vram in a VM, we are locked to that type for all of our VMs. Meaning we can only have 4GB VMs, its not possible to mix different types to have one 4GB VM, and two 2GB VMs.
Important notes Q profiles can give you horrible performance in OpenGL applications/games. To fix that, switch to an equivalent A or B profile (for example GRID RTX6000-4B) C profiles (for example GRID RTX6000-4C) only work on Linux, don’t try using those on Windows, it will not work - at all. A profiles (for example GRID RTX6000-4A) will NOT work on Linux, they only work on Windows.
All of that changes with the override config file. Technically we are still locked to only using one profile, but now its possible to change the vram of the profile on a VM basis so even though we have three GRID RTX6000-4Q instances, one VM can have 4GB or vram but we can override the vram size for the other two VMs to only 2GB. Lets take a look at this example config override file (its in TOML format)
[profile.nvidia-259]
num_displays = 1 # Max number of virtual displays. Usually 1 if you want a simple remote gaming VM
display_width = 1920 # Maximum display width in the VM
display_height = 1080 # Maximum display height in the VM
max_pixels = 2073600 # This is the product of display_width and display_height so 1920 * 1080 = 2073600
cuda_enabled = 1 # Enables CUDA support. Either 1 or 0 for enabled/disabled
frl_enabled = 1 # This controls the frame rate limiter, if you enable it your fps in the VM get locked to 60fps. Either 1 or 0 for enabled/disabled
framebuffer = 0x74000000
framebuffer_reservation = 0xC000000 # In combination with the framebuffer size
# above, these two lines will give you a VM
# with 2GB of VRAM (framebuffer + framebuffer_reservation = VRAM size in bytes).
# See below for some other sizes
[vm.100]
frl_enabled = 0
# You can override all the options from above here too. If you want to add more overrides for a new VM, just copy this block and change the VM ID
There are two blocks here, the first being [profile.nvidia-259] and the second [vm.100]. The first one applies the overrides to all VM instances of the nvidia-259 type (thats GRID RTX6000-4Q) and the second one applies its overrides only to one specific VM, that one with the proxmox VM ID 100. The proxmox VM ID is the same number that you see in the proxmox webinterface, next to the VM name. You don’t have to specify all parameters, only the ones you need/want. There are some more that I didn’t mention here, you can find them by going through the source code of the vgpu_unlock-rs repo. For a simple 1080p remote gaming VM I recommend going with something like this
[profile.nvidia-259] # choose the profile you want here
num_displays = 1
display_width = 1920
display_height = 1080
max_pixels = 2073600
Common VRAM sizes
Here are some common framebuffer sizes that you might want to use:
512MB
framebuffer = 0x1A000000
framebuffer_reservation = 0x6000000
1GB
framebuffer = 0x38000000
framebuffer_reservation = 0x8000000
2GB
framebuffer = 0x74000000
framebuffer_reservation = 0xC000000
3GB
framebuffer = 0xB0000000
framebuffer_reservation = 0x10000000
4GB
framebuffer = 0xEC000000
framebuffer_reservation = 0x14000000
5GB
framebuffer = 0x128000000
framebuffer_reservation = 0x18000000
6GB
framebuffer = 0x164000000
framebuffer_reservation = 0x1C000000
8GB
framebuffer = 0x1DC000000
framebuffer_reservation = 0x24000000
10GB
framebuffer = 0x254000000
framebuffer_reservation = 0x2C000000
12GB
framebuffer = 0x2CC000000
framebuffer_reservation = 0x34000000
16GB
framebuffer = 0x3BC000000
framebuffer_reservation = 0x44000000
20GB
framebuffer = 0x4AC000000
framebuffer_reservation = 0x54000000
24GB
framebuffer = 0x59C000000
framebuffer_reservation = 0x64000000
32GB
framebuffer = 0x77C000000
framebuffer_reservation = 0x84000000
48GB
framebuffer = 0xB2D200000
framebuffer_reservation = 0xD2E00000
framebuffer and framebuffer_reservation will always equal the VRAM size in bytes when added together.
Adding a vGPU to a Proxmox VM
Go to the proxmox webinterface, go to your VM, then to Hardware, then to Add and select PCI Device. You should be able to choose from a list of pci devices. Choose your GPU there, its entry should say Yes in the Mediated Devices column. Now you should be able to also select the MDev Type. Choose whatever profile you want, if you don’t remember which one you want, you can see the list of all available types with mdevctl types.
Finish by clicking Add, start the VM and install the required drivers. scp
cd /home/yanboyang713/Downloads/vGPU/Guest_Drivers
scp NVIDIA-Linux-x86_64-535.129.03-grid.run yanboyang713@192.168.88.234:/home/yanboyang713
scp nvidia-linux-grid-535_535.129.03_amd64.deb yanboyang713@192.168.88.255:/home/yanboyang713
cd /home/yanboyang713/Downloads/vGPU/Signing_Keys
scp vgpu_debian_publickey.pub yanboyang713@192.168.88.234:/home/yanboyang713
sudo apt install ./nvidia-linux-grid-535_535.129.03_amd64.deb
sudo apt --fix-broken install
sudo apt install build-essential
sudo bash NVIDIA-Linux-x86_64-535.129.03-grid.run
nvidia-smi
After installing the drivers you can shut the VM down and remove the virtual display adapter by selecting Display in the Hardware section and selecting none (none). ONLY do that if you have some other way to access the Virtual Machine like Parsec or Remote Desktop because the Proxmox Console won’t work anymore. Enjoy your new vGPU VM :)
the best compatibility is reached when using q35 as machine type, must be SeaBIOS, without secure boot and PCIe instead of PCI.
https://cloud.google.com/compute/docs/gpus/install-drivers-gpu?hl=zh-cn
vGPU Licensing
Usually a license is required to use vGPU
The recommended way to get around the license is to set up your own license server. Follow the instructions here (or here if the other link is down).
【淘宝】https://m.tb.cn/h.5MjTOzd?tk=SuTkWgWXZdn CZ3457 「NVIDIA GRID vGPU驱动下载、安装服务 授权许可专业vGPU技术服务」 点击链接直接打开 或者 淘宝搜索直接打开
Installing cuda and cuDNN
Reference List
- https://docs.google.com/document/d/1pzrWJ9h-zANCtyqRgS7Vzla0Y8Ea2-5z2HEi4X75d2Q/edit
- https://github.com/DualCoder/vgpu_unlock
- https://wvthoog.nl/proxmox-7-vgpu-v2/#The_easy_way
- https://github.com/mbilker/vgpu_unlock-rs
- https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_20_series
- https://github.com/VGPU-Community-Drivers/vGPU-Unlock-patcher
- https://gitlab.com/polloloco/vgpu-proxmox