
Norway's National Library Leverages 2 Petabytes of Huawei Flash Storage for Sovereign Norwegian LLM Development
Norway’s National Library (Nasjonlbiblioteket) is developing a sovereign Large Language Model (LLM) specifically designed to understand the Norwegian language, culture, and history. To support this massive AI training data pipeline, the library has implemented 2 petabytes of Huawei OceanStor Dorado flash storage. Marius Husnes, Head of IT Platform, highlighted the necessity of this project at the Huawei ID Forum 2026, noting that global, English-centric LLMs lack the local context required for national linguistic sovereignty. With a digital archive spanning 20 petabytes of unique data—including books, newspapers, and web content—the library is uniquely positioned to train a model using copyrighted materials through exclusive agreements. This initiative underscores a growing trend of nations seeking to preserve their cultural heritage through localized artificial intelligence infrastructure.
Key Takeaways
- Sovereign AI Development: Norway is building its own Large Language Model (LLM) to ensure national history, news, and culture are accurately represented, addressing the gaps left by global English-speaking models.
- High-Performance Infrastructure: The project utilizes 2 petabytes of Huawei OceanStor Dorado flash storage to manage the intensive AI training data pipeline.
- Massive Data Repository: The National Library possesses a 20 PB unique digital archive (60 PB total under a 3-2-1 storage strategy), including books, broadcasts, and web content digitized since 2005.
- Exclusive Data Access: Through a legal deposit mandate and specific agreements with newspapers, the library has access to copyrighted content for AI training that private companies do not possess.
In-Depth Analysis
The Case for Linguistic and Cultural Sovereignty
At the Huawei ID Forum 2026 in Paris, Marius Husnes, the Head of IT Platform at Norway’s National Library, articulated a critical challenge facing non-English speaking nations: the lack of localized Large Language Models. Husnes argued that any country possessing its own language is at a distinct disadvantage if it relies solely on globally trained, English-centric LLMs. These commercial models often lack the depth of knowledge regarding a specific country’s history, contemporary news, and cultural nuances that are primarily documented in the local tongue.
To bridge this gap, Norway’s Ministry of Culture tasked the National Library with the creation of a sovereign LLM. The library is the ideal candidate for this task as it houses the single largest digital collection of Norwegian-language materials in existence. By developing a model in-house, Norway aims to ensure that its AI tools are deeply rooted in the nation's specific linguistic and cultural context, rather than being filtered through the lens of a foreign-trained algorithm.
Data Infrastructure and the 60 Petabyte Archive
The scale of the data involved in this project is significant. Since 2005, the National Library has been digitizing its vast collection, amassing 20 petabytes of unique data. To ensure the safety and longevity of this cultural heritage, the library employs a 3-2-1 storage strategy—maintaining three copies of the data across two different media types, with one copy stored off-site. This results in a total storage footprint of approximately 60 petabytes.
For the specific requirements of the AI training data pipeline, the library has integrated 2 petabytes of Huawei OceanStor Dorado flash storage. This high-performance storage is essential for handling the rapid data access and processing speeds required for LLM training. The data pipeline involves complex processes, including extensive OCR (Optical Character Recognition) scanning of raw text, sound, moving pictures, and still images. This process generates significant metadata and supports APIs for online access, transforming a preservation archive into a dynamic training set for artificial intelligence.
Legal Mandates and Competitive Advantages
One of the most significant advantages the National Library holds over private AI developers is its legal standing and existing agreements. As a state library, it operates under a legal deposit mandate, which entitles it to receive copies of every book published and every broadcast aired in Norway. This mandate was specifically extended to cover the preservation of all Norwegian cultural heritage.
Furthermore, the library has secured a unique agreement with Norwegian newspapers that permits the use of copyrighted content for LLM training. As Husnes noted, no private company currently possesses this level of access to high-quality, copyrighted Norwegian text. This legal framework allows the library to train its AI on a more comprehensive and authoritative dataset than any commercial provider could legally acquire, further cementing the model's status as a sovereign national asset.
Industry Impact
The Rise of National AI Initiatives
Norway's move to build a sovereign LLM reflects a broader global trend where nations are beginning to view AI as a critical component of cultural and linguistic preservation. By investing in localized models, countries can protect their digital sovereignty and ensure that their citizens have access to AI tools that understand their specific societal context. This shift may lead to a more fragmented but culturally diverse AI landscape, moving away from the dominance of a few global models.
Storage Requirements for Modern AI Pipelines
The use of 2 PB of flash storage specifically for the AI pipeline highlights the evolving infrastructure needs of the industry. As LLMs grow in complexity and the datasets they train on expand, the demand for high-speed, reliable storage solutions like the Huawei OceanStor Dorado will likely increase. The project demonstrates that for large-scale AI training, the bottleneck is often not just the compute power, but the ability of the storage system to feed data into the training pipeline efficiently.
Frequently Asked Questions
Question: Why is Norway building its own LLM instead of using existing commercial models?
Existing commercial LLMs are primarily trained on English-language data and often lack a deep understanding of Norwegian history, culture, and local news. By building a sovereign LLM, Norway ensures that its AI tools are culturally and linguistically accurate for its citizens.
Question: What kind of data is being used to train the Norwegian LLM?
The training data comes from the National Library’s 20 PB unique digital archive, which includes books, newspapers, web pages, sound recordings, moving pictures, and still images. This includes copyrighted newspaper content made available through special agreements.
Question: What role does Huawei storage play in this project?
The library uses 2 petabytes of Huawei OceanStor Dorado flash storage specifically for the AI training data pipeline. This high-performance storage is necessary to handle the intensive data processing and OCR scanning required to prepare the library's massive archive for LLM training.

