Back to List
New nbd-vram Tool Enables Linux Users to Utilize NVIDIA GPU VRAM as High-Speed Swap Space
Open SourceLinuxNVIDIAGPU

New nbd-vram Tool Enables Linux Users to Utilize NVIDIA GPU VRAM as High-Speed Swap Space

A new open-source project titled 'nbd-vram' has emerged, offering a novel solution for Linux users—particularly those with laptops featuring soldered, non-upgradeable memory—to utilize NVIDIA GPU VRAM as system swap space. By leveraging the Network Block Device (NBD) protocol and the CUDA driver API, the tool bypasses long-standing hardware restrictions that prevent consumer-grade GeForce GPUs from using direct Peer-to-Peer (P2P) memory access. In practical testing on an RTX 3070 laptop, the tool successfully allocated 7GB of VRAM to the swap pool, contributing to a total addressable memory of approximately 46GB. This approach provides a faster alternative to traditional SSD swap by utilizing the high bandwidth of the PCIe interface while remaining resilient to system updates.

Hacker News

Introduction

In the landscape of modern mobile computing, many laptops are designed with soldered RAM, leaving users with no path for hardware upgrades when memory demands exceed physical capacity. To address this limitation, a new technical implementation called nbd-vram has been developed for Linux systems. This tool allows users to repurpose the high-speed Video RAM (VRAM) on NVIDIA GPUs as functional system swap space. By creating a tiered memory architecture, nbd-vram ensures that under-utilized GPU resources can support system stability and performance when traditional RAM is exhausted.

Key Takeaways

  • Hardware Optimization: Specifically designed for laptops with soldered memory and no upgrade path, allowing an RTX card's VRAM to act as a buffer before the system resorts to slower SSD swap.
  • Technical Workaround: Bypasses the 'EINVAL' errors and RM-level gating that NVIDIA imposes on consumer GeForce GPUs regarding direct P2P memory access.
  • Tiered Memory Hierarchy: Establishes a clear overflow order: Physical RAM fills first, followed by VRAM (via PCIe), then zram (CPU compression), and finally SSD storage.
  • Stability and Maintenance: Requires no custom kernel modules or NVIDIA kernel symbols, ensuring the tool survives kernel and driver updates without the need for rebuilding.
  • Proven Performance: Testing on an RTX 3070 Laptop configuration demonstrated a tripling of addressable memory, reaching approximately 46GB of total swap and RAM capacity.

In-Depth Analysis

The NBD Architecture: Sidestepping Driver Restrictions

The core innovation of nbd-vram lies in its use of the Network Block Device (NBD) protocol to circumvent the limitations of consumer-grade NVIDIA hardware. Traditionally, developers have attempted to use the nvidia_p2p_get_pages_persistent API to pin VRAM pages so the CPU could access them directly. However, NVIDIA restricts this functionality to professional Quadro and datacenter SKUs. On GeForce cards, this approach consistently returns an EINVAL error, regardless of the driver version used.

To solve this, nbd-vram employs a small daemon that allocates VRAM through the standard CUDA driver API. This daemon then serves the allocated memory as a block device over a Unix socket using the NBD protocol. The Linux kernel's built-in NBD driver connects to this socket and exposes the memory as a standard device at /dev/nbdX. This allows the system to treat the GPU's VRAM as a normal swap device without requiring the specialized P2P features reserved for enterprise hardware.

Data Path and Memory Tiering

The efficiency of nbd-vram is rooted in its specific data path, which minimizes overhead while maximizing the speed of the PCIe bus. When the system needs to swap data to the GPU, the path follows a structured sequence: the kernel swap subsystem sends data to /dev/nbdX, which passes through the NBD kernel driver and a Unix socket to the nbd-vram daemon. Finally, the daemon utilizes cuMemcpyHtoD (Host to Device) and cuMemcpyDtoH (Device to Host) calls to move data in and out of the GPU VRAM.

This setup enables a sophisticated overflow order that prioritizes speed. Once the physical RAM is full, the VRAM absorbs the initial spillover. Because this transfer happens over the PCIe bus, it is significantly faster than traditional disk-based swap. Only after the VRAM is saturated does the system move to zram for CPU-based compression, and finally to the SSD as a last resort. In a documented test case using an RTX 3070 Laptop with 8GB of VRAM and 16GB of physical RAM, the user allocated 7GB for VRAM swap, resulting in a total addressable memory pool of 46GB when combined with other swap methods.

Overcoming BAR1 Mapping Issues

Another technical hurdle addressed by this project is the limitation of BAR1 (Base Address Register) mapping on consumer GPUs. Previous attempts to directly map the BAR1 physical address without the P2P API failed because consumer GPU internal page tables typically only have about 16 MiB of BAR1 mapped—just enough for the display framebuffer. Attempts to read from the rest of the address space return only zeros. While tools like mkswap might appear to succeed in such environments, the subsequent swapon command fails because the swap header is never actually written to the hardware. By using the CUDA API to handle memory copies, nbd-vram effectively sidesteps these hardware-level mapping restrictions, providing a reliable way to utilize the full extent of the allocated VRAM.

Industry Impact

The release of nbd-vram highlights a growing trend in the open-source community to reclaim hardware functionality that is software-locked by manufacturers. For the Linux ecosystem, this provides a vital lifeline for users of modern laptops where hardware modularity has been sacrificed for thinness or cost. By turning a "static" resource like GPU VRAM into a dynamic system resource, this project demonstrates how standard APIs (like CUDA and NBD) can be creatively combined to solve hardware limitations. Furthermore, the decision to avoid kernel modules ensures that this solution remains accessible to a wide range of users without the technical debt of maintaining custom code across frequent Linux kernel updates.

Frequently Asked Questions

Question: Why can't GeForce users use the standard NVIDIA P2P API for this purpose?

NVIDIA has gated the nvidia_p2p_get_pages_persistent API at the Resource Manager (RM) level, making it exclusive to Quadro and datacenter GPUs. On consumer GeForce cards, the driver is programmed to return an EINVAL error, preventing the CPU from directly pinning and accessing VRAM pages in this manner.

Question: How does the performance of VRAM swap compare to traditional SSD swap?

VRAM swap is generally faster because it utilizes the high-bandwidth PCIe interface for data transfers. In the nbd-vram hierarchy, VRAM is positioned as the first overflow point after physical RAM is exhausted, specifically to take advantage of this speed before the system resorts to CPU-intensive compression (zram) or slower disk writes (SSD).

Question: Will I need to rebuild the tool every time I update my Linux kernel or NVIDIA driver?

No. One of the primary advantages of the NBD-based approach is that it does not use a custom kernel module or rely on internal NVIDIA kernel symbols. Because it operates as a daemon using the standard CUDA API and the kernel's built-in NBD driver, it is designed to survive system updates without requiring a rebuild.

Related News

Meituan Open Sources AIGC Poster Generation Framework: Analyzing the Generation-Editing-Evaluation Technical Loop
Open Source

Meituan Open Sources AIGC Poster Generation Framework: Analyzing the Generation-Editing-Evaluation Technical Loop

Meituan's Intelligent Creation Team has officially unveiled and open-sourced its comprehensive technical system for AIGC-driven poster generation. The framework is built upon a sophisticated "Generation-Editing-Evaluation" closed loop, designed to bridge the gap between raw AI output and production-ready commercial assets. Currently deployed within Meituan Waimai and various Brand IP scenarios, this system addresses the practical challenges of automated design by integrating creative generation with precise editing tools and automated quality assessment. By open-sourcing the entire technical stack, Meituan aims to provide the developer community with a proven, industrial-grade solution for scalable visual content creation. This move signifies a major step in the practical application of AIGC within the food delivery and digital branding sectors, offering a structured approach to maintaining design quality at scale.

Meituan Open-Sources LongCat-Video-Avatar 1.5: Advancing Digital Human Video Generation for Commercial Use
Open Source

Meituan Open-Sources LongCat-Video-Avatar 1.5: Advancing Digital Human Video Generation for Commercial Use

Meituan's technical team has officially open-sourced LongCat-Video-Avatar 1.5, marking a significant transition from experimental state-of-the-art (SOTA) research to practical, commercial-grade digital human video generation. This major update introduces comprehensive improvements in lip-sync accuracy, physical plausibility, and long-video stability. Furthermore, the model now supports multi-person interactions and features optimized inference efficiency. Designed to handle complex commercial environments, LongCat-Video-Avatar 1.5 aims to provide stable, natural, and high-quality content, effectively moving digital human technology from controlled laboratory settings to diverse, real-world applications. The release emphasizes a shift toward "thousand people, thousand faces" personalization in the digital human landscape.

LongCat-Flash-Prover: Meituan Open-Sources AI Model for Rigorous Mathematical Theorem Proving and Formalization
Open Source

LongCat-Flash-Prover: Meituan Open-Sources AI Model for Rigorous Mathematical Theorem Proving and Formalization

The Meituan technical team has announced the open-source release of LongCat-Flash-Prover, a specialized AI model designed to tackle the complexities of mathematical formalization and theorem proving. Unlike conventional AI models that focus primarily on achieving correct numerical outputs, LongCat-Flash-Prover is built to maintain rigorous logical chains required for formal verification. The project addresses a fundamental challenge in AI reasoning: the inherent ambiguity of natural language, which can lead to the failure of complex mathematical proofs. By prioritizing formalization over simple answer-guessing, Meituan aims to provide a tool that ensures every step of a mathematical argument is logically sound. This release marks a significant contribution to the open-source community, specifically targeting the transition from intuitive AI responses to verifiable mathematical rigor.