Back to List
System Prompt Leaks: Comprehensive Repository Reveals Internal Instructions for GPT-5.4, Claude 4.6, and Gemini 3.1
Industry NewsAI SecurityLarge Language ModelsGitHub Trending

System Prompt Leaks: Comprehensive Repository Reveals Internal Instructions for GPT-5.4, Claude 4.6, and Gemini 3.1

A significant repository hosted on GitHub by user asgeirtj has surfaced, documenting the leaked system prompts for the industry's most advanced AI models. The collection includes internal instructions for OpenAI's GPT-5.4 and GPT-5.3, Anthropic's Claude Opus 4.6 and Sonnet 4.6, and Google's Gemini 3.1 Pro and 3 Flash. Additionally, the leak covers system prompts for Grok 4.2 and Perplexity. These system prompts serve as the foundational behavioral guidelines for Large Language Models (LLMs), dictating how they interact with users and maintain safety protocols. The repository is reportedly updated on a regular basis, providing a rare look into the backend configurations of next-generation AI systems.

GitHub Trending

Key Takeaways

  • Extensive Model Coverage: The leak includes system prompts for high-profile models including GPT-5.4, Claude 4.6, Gemini 3.1, and Grok 4.2.
  • Centralized Repository: The data is hosted and regularly updated on GitHub under the project 'system_prompts_leaks'.
  • Diverse AI Ecosystem: The collection spans multiple developers, including OpenAI, Anthropic, Google, xAI, and Perplexity.
  • Technical Insight: These prompts reveal the underlying instructions and constraints placed on AI agents and coding tools like Claude Code and Gemini CLI.

In-Depth Analysis

Unveiling the Architecture of AI Behavior

The 'system_prompts_leaks' repository provides a detailed look at the internal directives that govern the behavior of leading AI models. By extracting prompts from versions such as GPT-5.4 and Claude Opus 4.6, the repository highlights the specific personas and operational boundaries set by AI developers. These system prompts are critical because they define the model's identity, its tone of voice, and the safety guardrails it must follow before a user even enters a query.

Comparative Directives Across Platforms

The inclusion of prompts from Gemini 3.1 Pro, Grok 4.2, and Perplexity allows for a comparative study of how different organizations approach AI alignment. For instance, the repository contains specific prompts for specialized tools like 'Claude Code' and 'Gemini CLI,' suggesting that system instructions are becoming increasingly modular and task-specific. The ongoing updates to this repository indicate a persistent effort to track how these instructions evolve as models are patched or upgraded.

Industry Impact

The disclosure of system prompts for flagship models like GPT-5.4 and Claude 4.6 has significant implications for the AI industry. For researchers, it provides transparency into the 'black box' of AI alignment and safety engineering. However, for developers, such leaks represent a potential security challenge, as understanding the system prompt is often the first step in developing 'jailbreak' techniques to bypass model restrictions. This repository underscores the ongoing tension between open-source transparency and the proprietary safety measures of major AI labs.

Frequently Asked Questions

Question: Which specific AI models are included in the leak?

The repository contains system prompts for OpenAI (GPT-5.4, GPT-5.3, Codex), Anthropic (Claude Opus 4.6, Sonnet 4.6, Claude Code), Google (Gemini 3.1 Pro, 3 Flash, CLI), xAI (Grok 4.2, 4), and Perplexity.

Question: What is the purpose of a system prompt?

A system prompt is a set of foundational instructions that tells an AI model how to behave, what rules to follow, and what its specific role or persona should be during a conversation.

Question: Where can this information be found?

The information is maintained in a GitHub repository titled 'system_prompts_leaks' by the author asgeirtj.

Related News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation
Industry News

Meituan LongCat Releases General 365: A Challenging New Benchmark for AI Reasoning Evaluation

Meituan's LongCat team has officially open-sourced General 365, a new evaluation benchmark designed to measure the reasoning capabilities of large language models (LLMs). In a comprehensive test involving 26 mainstream models, the results revealed a significant gap in current AI reasoning performance. Even the top-performing model, Gemini 3 Pro, achieved an accuracy of only 62.8%, while the vast majority of tested models failed to reach the 60% passing mark. This release aims to establish a more rigorous standard for the industry, highlighting the current limitations of even the most advanced AI systems in complex reasoning tasks. By providing a transparent and difficult metric, Meituan seeks to drive the development of more logically capable artificial intelligence.

Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code
Industry News

Managing AI Coding with Agent Evaluation Thinking: Meituan's Practice in Refactoring 310,000 Lines of Code

As AI-generated code now accounts for over 90% of development in certain environments, the primary challenge has shifted from generation speed to the effective management and constraint of AI capabilities. Meituan's technical team recently shared their experience refactoring 310,000 lines of code using a strategy centered on "Agent evaluation thinking." By implementing technical debt assessment, standardized rules, a specialized Refactoring SOP, and a Pre-PR (Pull Request) mechanism, they have successfully transformed large-scale refactoring from a high-cost, periodic project into a continuous, daily operational task. This approach ensures that AI-driven development does not amplify systemic chaos but instead adheres to unified technical standards, maintaining long-term code quality and system stability in an AI-dominated coding era.

Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI
Industry News

Meituan Technical Team Releases LARYBench: A New Benchmark for Universal Latent Action Representation in Embodied AI

The Meituan Technical Team has officially introduced LARYBench (Latent Action Representation Yielding Benchmark), a systematic evaluation framework designed to guide the learning of universal latent action representations from large-scale visual data. This benchmark marks a significant milestone in embodied AI by providing a standardized way to measure how models learn actions from visual inputs. Experimental results from the benchmark reveal that general vision models significantly outperform specialized embodied action expert models in both action generalization and control precision. Furthermore, the research demonstrates that embodied action representations can naturally emerge from large-scale human video data, suggesting that broad visual training is a viable path toward achieving more sophisticated and adaptable robotic control systems.