Back to List
New nbd-vram Tool Enables Linux Users to Utilize NVIDIA GPU VRAM as High-Speed Swap Space
Open SourceLinuxNVIDIAGPU

New nbd-vram Tool Enables Linux Users to Utilize NVIDIA GPU VRAM as High-Speed Swap Space

A new open-source project titled 'nbd-vram' has emerged, offering a novel solution for Linux users—particularly those with laptops featuring soldered, non-upgradeable memory—to utilize NVIDIA GPU VRAM as system swap space. By leveraging the Network Block Device (NBD) protocol and the CUDA driver API, the tool bypasses long-standing hardware restrictions that prevent consumer-grade GeForce GPUs from using direct Peer-to-Peer (P2P) memory access. In practical testing on an RTX 3070 laptop, the tool successfully allocated 7GB of VRAM to the swap pool, contributing to a total addressable memory of approximately 46GB. This approach provides a faster alternative to traditional SSD swap by utilizing the high bandwidth of the PCIe interface while remaining resilient to system updates.

Hacker News

Introduction

In the landscape of modern mobile computing, many laptops are designed with soldered RAM, leaving users with no path for hardware upgrades when memory demands exceed physical capacity. To address this limitation, a new technical implementation called nbd-vram has been developed for Linux systems. This tool allows users to repurpose the high-speed Video RAM (VRAM) on NVIDIA GPUs as functional system swap space. By creating a tiered memory architecture, nbd-vram ensures that under-utilized GPU resources can support system stability and performance when traditional RAM is exhausted.

Key Takeaways

  • Hardware Optimization: Specifically designed for laptops with soldered memory and no upgrade path, allowing an RTX card's VRAM to act as a buffer before the system resorts to slower SSD swap.
  • Technical Workaround: Bypasses the 'EINVAL' errors and RM-level gating that NVIDIA imposes on consumer GeForce GPUs regarding direct P2P memory access.
  • Tiered Memory Hierarchy: Establishes a clear overflow order: Physical RAM fills first, followed by VRAM (via PCIe), then zram (CPU compression), and finally SSD storage.
  • Stability and Maintenance: Requires no custom kernel modules or NVIDIA kernel symbols, ensuring the tool survives kernel and driver updates without the need for rebuilding.
  • Proven Performance: Testing on an RTX 3070 Laptop configuration demonstrated a tripling of addressable memory, reaching approximately 46GB of total swap and RAM capacity.

In-Depth Analysis

The NBD Architecture: Sidestepping Driver Restrictions

The core innovation of nbd-vram lies in its use of the Network Block Device (NBD) protocol to circumvent the limitations of consumer-grade NVIDIA hardware. Traditionally, developers have attempted to use the nvidia_p2p_get_pages_persistent API to pin VRAM pages so the CPU could access them directly. However, NVIDIA restricts this functionality to professional Quadro and datacenter SKUs. On GeForce cards, this approach consistently returns an EINVAL error, regardless of the driver version used.

To solve this, nbd-vram employs a small daemon that allocates VRAM through the standard CUDA driver API. This daemon then serves the allocated memory as a block device over a Unix socket using the NBD protocol. The Linux kernel's built-in NBD driver connects to this socket and exposes the memory as a standard device at /dev/nbdX. This allows the system to treat the GPU's VRAM as a normal swap device without requiring the specialized P2P features reserved for enterprise hardware.

Data Path and Memory Tiering

The efficiency of nbd-vram is rooted in its specific data path, which minimizes overhead while maximizing the speed of the PCIe bus. When the system needs to swap data to the GPU, the path follows a structured sequence: the kernel swap subsystem sends data to /dev/nbdX, which passes through the NBD kernel driver and a Unix socket to the nbd-vram daemon. Finally, the daemon utilizes cuMemcpyHtoD (Host to Device) and cuMemcpyDtoH (Device to Host) calls to move data in and out of the GPU VRAM.

This setup enables a sophisticated overflow order that prioritizes speed. Once the physical RAM is full, the VRAM absorbs the initial spillover. Because this transfer happens over the PCIe bus, it is significantly faster than traditional disk-based swap. Only after the VRAM is saturated does the system move to zram for CPU-based compression, and finally to the SSD as a last resort. In a documented test case using an RTX 3070 Laptop with 8GB of VRAM and 16GB of physical RAM, the user allocated 7GB for VRAM swap, resulting in a total addressable memory pool of 46GB when combined with other swap methods.

Overcoming BAR1 Mapping Issues

Another technical hurdle addressed by this project is the limitation of BAR1 (Base Address Register) mapping on consumer GPUs. Previous attempts to directly map the BAR1 physical address without the P2P API failed because consumer GPU internal page tables typically only have about 16 MiB of BAR1 mapped—just enough for the display framebuffer. Attempts to read from the rest of the address space return only zeros. While tools like mkswap might appear to succeed in such environments, the subsequent swapon command fails because the swap header is never actually written to the hardware. By using the CUDA API to handle memory copies, nbd-vram effectively sidesteps these hardware-level mapping restrictions, providing a reliable way to utilize the full extent of the allocated VRAM.

Industry Impact

The release of nbd-vram highlights a growing trend in the open-source community to reclaim hardware functionality that is software-locked by manufacturers. For the Linux ecosystem, this provides a vital lifeline for users of modern laptops where hardware modularity has been sacrificed for thinness or cost. By turning a "static" resource like GPU VRAM into a dynamic system resource, this project demonstrates how standard APIs (like CUDA and NBD) can be creatively combined to solve hardware limitations. Furthermore, the decision to avoid kernel modules ensures that this solution remains accessible to a wide range of users without the technical debt of maintaining custom code across frequent Linux kernel updates.

Frequently Asked Questions

Question: Why can't GeForce users use the standard NVIDIA P2P API for this purpose?

NVIDIA has gated the nvidia_p2p_get_pages_persistent API at the Resource Manager (RM) level, making it exclusive to Quadro and datacenter GPUs. On consumer GeForce cards, the driver is programmed to return an EINVAL error, preventing the CPU from directly pinning and accessing VRAM pages in this manner.

Question: How does the performance of VRAM swap compare to traditional SSD swap?

VRAM swap is generally faster because it utilizes the high-bandwidth PCIe interface for data transfers. In the nbd-vram hierarchy, VRAM is positioned as the first overflow point after physical RAM is exhausted, specifically to take advantage of this speed before the system resorts to CPU-intensive compression (zram) or slower disk writes (SSD).

Question: Will I need to rebuild the tool every time I update my Linux kernel or NVIDIA driver?

No. One of the primary advantages of the NBD-based approach is that it does not use a custom kernel module or rely on internal NVIDIA kernel symbols. Because it operates as a daemon using the standard CUDA API and the kernel's built-in NBD driver, it is designed to survive system updates without requiring a rebuild.

Related News

Scrapling: A New Adaptive Web Scraping Framework for Scalable Data Extraction
Open Source

Scrapling: A New Adaptive Web Scraping Framework for Scalable Data Extraction

Scrapling, a newly trending open-source project developed by D4Vinci, is an adaptive web scraping framework designed to streamline data extraction tasks. The framework is engineered to be highly versatile, capable of managing everything from simple, single-request tasks to complex, large-scale scraping operations. By offering an adaptive approach, Scrapling aims to provide developers with a robust toolset for navigating the complexities of modern web environments. Currently hosted on GitHub and supported by comprehensive documentation, Scrapling represents a significant addition to the ecosystem of web crawling tools, focusing on flexibility and scalability for diverse data collection needs.

Impeccable: A New Design Language for Enhancing AI-Driven Front-End Development
Open Source

Impeccable: A New Design Language for Enhancing AI-Driven Front-End Development

Impeccable, a specialized design language developed by pbakaus, has emerged as a significant tool for optimizing how AI models approach front-end design. The project introduces a structured vocabulary designed to bridge the gap between artificial intelligence and high-quality user interface execution. By providing a framework consisting of one core skill, 23 specific commands, and a curated selection of anti-patterns, Impeccable aims to refine the output of AI-generated designs. This initiative addresses the common limitations of AI in understanding the nuances of perfect front-end development, offering a more precise way for developers to communicate design requirements to AI systems. The project emphasizes the importance of both positive instructions and the avoidance of common pitfalls to achieve professional-grade results.

Heretic: The New Fully Automated Tool for Removing Censorship from Language Models
Open Source

Heretic: The New Fully Automated Tool for Removing Censorship from Language Models

Heretic is a specialized open-source utility developed by p-e-w, designed to provide a fully automated solution for removing censorship from language models. As a project gaining traction on GitHub, it addresses the technical challenge of bypassing safety filters and alignment constraints embedded in AI systems. The tool's primary function is to streamline the process of 'uncensoring' models, which typically involves complex manual fine-tuning or weight modification. By offering an automated approach, Heretic positions itself as a significant resource for developers and researchers seeking unrestricted access to the raw capabilities of large language models. This summary highlights the tool's core purpose as a censorship removal mechanism and its emergence within the open-source AI development community.