Back to List
Anthropic Addresses Claude Code Quality Degradation Reports and Implements Fixes for Sonnet and Opus Models
Industry NewsAnthropicClaudeAI Engineering

Anthropic Addresses Claude Code Quality Degradation Reports and Implements Fixes for Sonnet and Opus Models

Anthropic has released a postmortem addressing recent user reports regarding the degradation of Claude's performance across specific tools, including Claude Code, the Claude Agent SDK, and Claude Cowork. The investigation identified three distinct technical issues occurring between March and April 2026: an intentional but poorly received reduction in reasoning effort to manage latency, a session-clearing bug that caused repetitive behavior and memory loss, and a system prompt change aimed at reducing verbosity that inadvertently harmed coding quality. While the API remained unaffected, these issues impacted Sonnet 4.6, Opus 4.6, and Opus 4.7. Anthropic has since reverted the problematic changes and fixed the bugs as of April 20 (v2.1.116), emphasizing their commitment to maintaining model intelligence over speed.

Hacker News

Key Takeaways

  • Three Distinct Issues Identified: The perceived degradation was caused by a change in reasoning effort, a session-clearing bug, and a system prompt instruction to reduce verbosity.
  • Specific Tools Affected: Issues were limited to Claude Code, the Claude Agent SDK, and Claude Cowork; the core API and inference layer were not impacted.
  • Models Impacted: The performance dips affected Sonnet 4.6, Opus 4.6, and Opus 4.7 across different timeframes.
  • Full Resolution: All identified issues were resolved as of April 20 with the release of version 2.1.116.

In-Depth Analysis

Reasoning Effort and Latency Trade-offs

On March 4, Anthropic attempted to address UI latency issues where the interface appeared frozen by changing the default reasoning effort from "high" to "medium." While this was intended to improve the user experience by reducing wait times, it resulted in a noticeable drop in intelligence for Sonnet 4.6 and Opus 4.6. Following user feedback indicating a preference for higher intelligence over speed, Anthropic reverted this change on April 7. The company acknowledged that prioritizing lower latency at the expense of reasoning quality was the "wrong tradeoff."

Technical Bugs and Prompting Side Effects

Two additional technical factors contributed to the degradation. On March 26, a feature designed to clear old thinking from idle sessions to improve resumption speed introduced a bug. This bug caused the system to clear thinking every turn, making the models appear forgetful and repetitive. Furthermore, an April 16 update to the system prompt intended to reduce verbosity negatively impacted coding quality when combined with other prompt adjustments. This specific issue affected the latest models, including Opus 4.7. Both the bug and the prompt changes were corrected and reverted by April 20.

Investigation Challenges and Aggregate Effects

Anthropic noted that because these three changes occurred on different schedules and affected different segments of traffic, the resulting feedback appeared as broad and inconsistent degradation. The investigation began in early March but was complicated by the difficulty of distinguishing these specific technical failures from the normal variation in user feedback. The company has reaffirmed that they never intentionally degrade models and are implementing changes to prevent similar regressions in the future.

Industry Impact

This incident highlights the delicate balance AI providers must maintain between model "intelligence" (reasoning effort) and operational performance (latency). For the AI industry, it serves as a case study in how minor optimizations—such as reducing verbosity or clearing session cache—can have significant, unintended consequences on the quality of complex tasks like coding. Anthropic's transparent postmortem underscores the importance of user feedback loops in identifying non-obvious regressions that automated testing might miss, particularly when those regressions are tied to UI-specific implementations rather than the underlying API.

Frequently Asked Questions

Question: Was the Claude API affected by these quality issues?

No. Anthropic confirmed that the API and inference layer remained unaffected throughout this period; the issues were isolated to Claude Code, the Claude Agent SDK, and Claude Cowork.

Question: Which Claude models were impacted by the performance degradation?

The issues affected Sonnet 4.6, Opus 4.6, and Opus 4.7, depending on the specific technical change and the timeframe.

Question: How has Anthropic resolved these issues?

As of April 20 (v2.1.116), Anthropic has reverted the reasoning effort to "high," fixed the session-clearing bug, and removed the system prompt instructions that were harming coding quality.

Related News

Meituan LongCat Team Unveils WBench: The First Systematic Multi-Round Benchmark for Interactive Video World Models
Industry News

Meituan LongCat Team Unveils WBench: The First Systematic Multi-Round Benchmark for Interactive Video World Models

The Meituan LongCat team has announced the release and open-sourcing of WBench, a pioneering systematic multi-round evaluation benchmark specifically designed for interactive video world models. Positioned as a diagnostic "CT scanner" for AI, WBench aims to provide precise insights into the technical bottlenecks that occur during the transition from passive video generation to active user interaction. By evaluating models across diverse scenarios—ranging from lunar walks to futuristic cyber cities—WBench addresses the critical need for standardized metrics in the evolving field of world models. This benchmark represents a significant step in identifying where current AI systems struggle to maintain consistency and logic during complex, multi-stage interactive sequences, offering a roadmap for future development in the industry.

Meituan at ACL 2026: Advancing Generative AI Through Evaluation, Reasoning, and Optimization
Industry News

Meituan at ACL 2026: Advancing Generative AI Through Evaluation, Reasoning, and Optimization

The Meituan Technical Team has announced that six of its research papers have been accepted for ACL 2026, a premier international conference in computational linguistics and natural language processing (NLP). These papers represent a significant contribution to the field, covering a diverse range of cutting-edge topics including large language model (LLM) evaluation, complex process reasoning, and competition-level mathematical thinking optimization. Furthermore, the research explores advancements in reinforcement learning and the emerging field of generative recommendation systems. By focusing on these critical areas, Meituan aims to establish a new paradigm for generative AI, bridging the gap between theoretical research and practical industry applications. This selection underscores Meituan's growing influence in the global AI research community and its commitment to solving complex technical challenges in the NLP domain.

Meituan LongCat Open Sources General 365: A New Benchmark Revealing AI Reasoning Challenges
Industry News

Meituan LongCat Open Sources General 365: A New Benchmark Revealing AI Reasoning Challenges

Meituan's LongCat team has officially released General 365, an open-source benchmark designed to evaluate the reasoning capabilities of modern AI models. Through a rigorous assessment of 26 mainstream models, the team discovered a significant performance gap in the industry. Gemini 3 Pro emerged as the top performer with an accuracy rate of 62.8%, yet it remains one of the few to surpass the 60% mark. The majority of the models tested failed to reach this basic competency level, highlighting the ongoing challenges in developing advanced reasoning within artificial intelligence. This benchmark serves as a critical new tool for the AI community to measure and improve logical processing, setting a high bar for future model development.