Use NVIDIA GPU VRAM as Swap Space on Linux with nbd-vram

Q: Question: Why can't GeForce users use the standard NVIDIA P2P API for this purpose?

NVIDIA has gated the `nvidia_p2p_get_pages_persistent` API at the Resource Manager (RM) level, making it exclusive to Quadro and datacenter GPUs. On consumer GeForce cards, the driver is programmed to return an `EINVAL` error, preventing the CPU from directly pinning and accessing VRAM pages in this manner.

A new open-source project titled 'nbd-vram' has emerged, offering a novel solution for Linux users—particularly those with laptops featuring soldered, non-upgradeable memory—to utilize NVIDIA GPU VRAM as system swap space. By leveraging the Network Block Device (NBD) protocol and the CUDA driver API, the tool bypasses long-standing hardware restrictions that prevent consumer-grade GeForce GPUs from using direct Peer-to-Peer (P2P) memory access. In practical testing on an RTX 3070 laptop, the tool successfully allocated 7GB of VRAM to the swap pool, contributing to a total addressable memory of approximately 46GB. This approach provides a faster alternative to traditional SSD swap by utilizing the high bandwidth of the PCIe interface while remaining resilient to system updates.

Introduction

In the landscape of modern mobile computing, many laptops are designed with soldered RAM, leaving users with no path for hardware upgrades when memory demands exceed physical capacity. To address this limitation, a new technical implementation called nbd-vram has been developed for Linux systems. This tool allows users to repurpose the high-speed Video RAM (VRAM) on NVIDIA GPUs as functional system swap space. By creating a tiered memory architecture, nbd-vram ensures that under-utilized GPU resources can support system stability and performance when traditional RAM is exhausted.

Key Takeaways

Hardware Optimization: Specifically designed for laptops with soldered memory and no upgrade path, allowing an RTX card's VRAM to act as a buffer before the system resorts to slower SSD swap.
Technical Workaround: Bypasses the 'EINVAL' errors and RM-level gating that NVIDIA imposes on consumer GeForce GPUs regarding direct P2P memory access.
Tiered Memory Hierarchy: Establishes a clear overflow order: Physical RAM fills first, followed by VRAM (via PCIe), then zram (CPU compression), and finally SSD storage.
Stability and Maintenance: Requires no custom kernel modules or NVIDIA kernel symbols, ensuring the tool survives kernel and driver updates without the need for rebuilding.
Proven Performance: Testing on an RTX 3070 Laptop configuration demonstrated a tripling of addressable memory, reaching approximately 46GB of total swap and RAM capacity.

In-Depth Analysis

The NBD Architecture: Sidestepping Driver Restrictions

The core innovation of nbd-vram lies in its use of the Network Block Device (NBD) protocol to circumvent the limitations of consumer-grade NVIDIA hardware. Traditionally, developers have attempted to use the nvidia_p2p_get_pages_persistent API to pin VRAM pages so the CPU could access them directly. However, NVIDIA restricts this functionality to professional Quadro and datacenter SKUs. On GeForce cards, this approach consistently returns an EINVAL error, regardless of the driver version used.

To solve this, nbd-vram employs a small daemon that allocates VRAM through the standard CUDA driver API. This daemon then serves the allocated memory as a block device over a Unix socket using the NBD protocol. The Linux kernel's built-in NBD driver connects to this socket and exposes the memory as a standard device at /dev/nbdX. This allows the system to treat the GPU's VRAM as a normal swap device without requiring the specialized P2P features reserved for enterprise hardware.

Data Path and Memory Tiering

The efficiency of nbd-vram is rooted in its specific data path, which minimizes overhead while maximizing the speed of the PCIe bus. When the system needs to swap data to the GPU, the path follows a structured sequence: the kernel swap subsystem sends data to /dev/nbdX, which passes through the NBD kernel driver and a Unix socket to the nbd-vram daemon. Finally, the daemon utilizes cuMemcpyHtoD (Host to Device) and cuMemcpyDtoH (Device to Host) calls to move data in and out of the GPU VRAM.

This setup enables a sophisticated overflow order that prioritizes speed. Once the physical RAM is full, the VRAM absorbs the initial spillover. Because this transfer happens over the PCIe bus, it is significantly faster than traditional disk-based swap. Only after the VRAM is saturated does the system move to zram for CPU-based compression, and finally to the SSD as a last resort. In a documented test case using an RTX 3070 Laptop with 8GB of VRAM and 16GB of physical RAM, the user allocated 7GB for VRAM swap, resulting in a total addressable memory pool of 46GB when combined with other swap methods.

Overcoming BAR1 Mapping Issues

Another technical hurdle addressed by this project is the limitation of BAR1 (Base Address Register) mapping on consumer GPUs. Previous attempts to directly map the BAR1 physical address without the P2P API failed because consumer GPU internal page tables typically only have about 16 MiB of BAR1 mapped—just enough for the display framebuffer. Attempts to read from the rest of the address space return only zeros. While tools like mkswap might appear to succeed in such environments, the subsequent swapon command fails because the swap header is never actually written to the hardware. By using the CUDA API to handle memory copies, nbd-vram effectively sidesteps these hardware-level mapping restrictions, providing a reliable way to utilize the full extent of the allocated VRAM.

Industry Impact

The release of nbd-vram highlights a growing trend in the open-source community to reclaim hardware functionality that is software-locked by manufacturers. For the Linux ecosystem, this provides a vital lifeline for users of modern laptops where hardware modularity has been sacrificed for thinness or cost. By turning a "static" resource like GPU VRAM into a dynamic system resource, this project demonstrates how standard APIs (like CUDA and NBD) can be creatively combined to solve hardware limitations. Furthermore, the decision to avoid kernel modules ensures that this solution remains accessible to a wide range of users without the technical debt of maintaining custom code across frequent Linux kernel updates.

Frequently Asked Questions

Question: Why can't GeForce users use the standard NVIDIA P2P API for this purpose?

NVIDIA has gated the nvidia_p2p_get_pages_persistent API at the Resource Manager (RM) level, making it exclusive to Quadro and datacenter GPUs. On consumer GeForce cards, the driver is programmed to return an EINVAL error, preventing the CPU from directly pinning and accessing VRAM pages in this manner.

Question: How does the performance of VRAM swap compare to traditional SSD swap?

VRAM swap is generally faster because it utilizes the high-bandwidth PCIe interface for data transfers. In the nbd-vram hierarchy, VRAM is positioned as the first overflow point after physical RAM is exhausted, specifically to take advantage of this speed before the system resorts to CPU-intensive compression (zram) or slower disk writes (SSD).

Question: Will I need to rebuild the tool every time I update my Linux kernel or NVIDIA driver?

No. One of the primary advantages of the NBD-based approach is that it does not use a custom kernel module or rely on internal NVIDIA kernel symbols. Because it operates as a daemon using the standard CUDA API and the kernel's built-in NBD driver, it is designed to survive system updates without requiring a rebuild.

New nbd-vram Tool Enables Linux Users to Utilize NVIDIA GPU VRAM as High-Speed Swap Space