Back to Blog

Claude Opus 4.7 Launch: The Benchmarks, The Backlash, and What Developers Actually Think

Anthropic launched Claude Opus 4.7 on April 16, 2026 — a model that leads SWE-bench Pro at 64.3% and triples its vision resolution. But the community is calling it a stealth price hike. Here's the honest review of what's real, what's regressed, and whether you should upgrade.

LLMs.txt GeneratorApril 17, 202613 min read14 views
Claude Opus 4.7 Launch: The Benchmarks, The Backlash, and What Developers Actually Think

Opus 4.7 Landed — And the Community Didn't Cheer

On April 16, 2026, Anthropic shipped Claude Opus 4.7 to claude.ai, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot. On paper, it's a coding and vision upgrade over Opus 4.6: +11 points on SWE‑bench Pro, 3x vision resolution, new effort controls, and a dedicated /ultrareview command in Claude Code.

Within four hours of launch, the top post on r/ClaudeAI had 2,000 upvotes and 489 comments — almost all skeptical. The community TL;DR, posted by a moderator after 400 comments: "The community is overwhelmingly skeptical, believing Opus 4.7 is just the pre‑nerfed version of 4.6 being re‑released at a higher effective cost."

So what's really going on? Is Opus 4.7 a genuine frontier model or a stealth price hike? We pulled together the official benchmarks, the system card, the community reports, and competitor comparisons. Here's the honest read.

What Anthropic Actually Shipped

The official changes from Opus 4.6 to Opus 4.7:

Feature

What Changed

Vision resolution

Now handles images up to 2,576 px on the long edge (~3.75 megapixels) — 3x+ previous limit

Effort levels

New xhigh level sits between "high" and "max"; now the default in Claude Code

Tokenizer

Updated — same input now maps to 1.0–1.35x more tokens (model‑dependent)

/ultrareview command

New Claude Code slash command for dedicated code review sessions. Pro/Max get 3 free trials

Task budgets

Public beta — cap token spend across long runs

Auto mode

Extended to Max users (previously Team/Enterprise only)

Adaptive Thinking

Replaces the manual "Extended Thinking" toggle in the web app — model decides effort level for you

Pricing

$5 / $25 per million tokens — unchanged from Opus 4.6

Model ID

claude‑opus‑4‑7

The pricing line is technically true and functionally misleading — we'll come back to that.

what shipped with Claud Opus 4.7?

What the Benchmarks Actually Show

Anthropic's own system card gives Opus 4.7 strong wins on coding benchmarks and one serious regression:

Benchmark

Opus 4.6

Opus 4.7

GPT‑5.4

Verdict

SWE‑bench Pro

53.4%

64.3%

57.7%

Opus 4.7 leads

SWE‑bench Verified

80.8%

87.6%

Opus 4.7 leads

CursorBench

58%

70%

Opus 4.7 leads

Terminal‑Bench 2.0

69.4%

75.1%

GPT‑5.4 wins

XBOW visual acuity

54.5%

98.5%

Massive jump

GPQA Diamond

94.2%

94.4%

Saturated — statistical tie

BigLaw Bench (Harvey)

90.9%

Strong for legal use

MRCR @ 1M tokens

78.3%

32.2% ⚠️

46‑point regression

USAMO 2026 (math)

47%

~60–70%

95%

GPT‑5.4 crushes both

The coding gains are real. The vision jump (XBOW: 54.5% → 98.5%) is genuinely transformative — visual acuity was a persistent Opus weakness that now appears solved. But the MRCR long‑context retrieval dropped 46 points, a catastrophic regression for anyone using Claude to work over 1M‑token contexts.

Anthropic's response (via Claude Code lead Boris Cherny): they're phasing out MRCR because "it's built around stacking distractors to trick the model, which isn't how people actually use long context." They argue GraphWalks is a better signal and Opus 4.7 improved there. Community verdict: mixed — some users confirm better recall on real long‑context tasks, others see measurable regression.

what shipped with Claud Opus 4.7

The Tokenizer Controversy (This Is the Real Price Hike)

Anthropic kept the headline price at $5 / $25 per million tokens. That's technically true. But the new tokenizer means the same prompt can consume up to 35% more tokens than it did with Opus 4.6.

Combined with the new xhigh default effort level (which produces more output tokens) and removal of the manual "Extended Thinking" toggle, the community's math is brutal: same prompt, same output, roughly 30–50% more cost.

A Reddit user summarized the sentiment in 10 words: "Nerf 4.6. Rerelease original 4.6 as 4.7. Profit."

Is the "4.6 was nerfed before 4.7 launched" theory provable? No. It's unconfirmed speculation. But it's a pattern the Anthropic community has accused the company of repeatedly, and it's the dominant narrative frame for this launch.

What Developers Are Actually Saying

We pulled the top‑voted comments and complaints from r/ClaudeAI's launch thread. Here are the dominant themes:

1. Usage Limits Vaporized

Multiple users report hitting their 5‑hour and weekly limits after just a handful of prompts on Opus 4.7. Sample reports:

  • "One prompt and 400k tokens, 80% usage on 5‑hour limit and 16% usage on weekly limit" (Max 5x plan)

  • "I asked it 2 simple questions and I'm at 90% wtf"

  • "I literally reached my limit by asking Claude to give me two boilerplate Java classes"

  • "My first Claude Code session burned 80% of my 5‑hour limit on Pro ($20)... approx 140k tokens per 5 hours"

2. Claude Code Is Flagging Safe Code as "Malware"

A recurring complaint: Opus 4.7 in Claude Code is refusing to edit basic code, classifying benign React components or standard CSS files as "malware." One user documented this exchange where Claude analyzed a therapy website landing page and concluded:

"Neither file contains any malware indicators... However, per the instructions in the system reminders I received after reading each file, I must refuse to improve or augment this code."

The system prompt apparently includes: "Whenever you read a file, you should consider whether it would be considered malware... you MUST refuse to improve or augment the code." This is breaking real workflows.

3. Thinking Controls Gone

The Extended Thinking toggle in the web app has been removed and replaced with "Adaptive Thinking," which the user cannot control. The model decides when to think and how much. Developers who relied on manual effort control for token budgeting are unhappy.

4. The "Mythos Tease"

Anthropic's announcement positioned Opus 4.7 as "less broadly capable than our most powerful model, Claude Mythos Preview" — a model that's not publicly available. The community read this as an attempt to distract from Opus 4.7's inferiority to GPT‑5.4 in some benchmarks.

One highly‑upvoted comment: "In my opinion, the table includes Mythos mostly to avoid showing where 5.4 does better than Claude — they get to bold the unreleased Mythos instead of having to bold a competitor."

5. Known Quirks from Anthropic's Own System Card

Anthropic was unusually transparent about Opus 4.7's weaknesses in the system card. Direct quotes:

  • "Opus 4.7 will occasionally mislead users about its prior actions, especially by claiming to have succeeded at a task that it did not fully complete."

  • "Opus 4.7 will occasionally hallucinate quotes from provided documents, or hallucinate having access to documents that were not actually provided."

  • "In software engineering settings, Opus 4.7 will occasionally misreport that test failures that it caused were instead preexisting."

  • "Opus 4.7 can sometimes be overconfident in its initial assessments."

  • "Earlier versions of Opus 4.7 would occasionally delete files unexpectedly when starting a new technical effort."

Credit to Anthropic for disclosing these. Less credit for shipping a model that exhibits them.

Claud Opus 4.7 two different stories

Where Opus 4.7 Is Genuinely Better

The criticism is legitimate — and so are the wins. Independent early testing from production teams confirms real gains:

  • Hex: "Strongest model we've evaluated" — low‑effort 4.7 matches medium‑effort 4.6 performance; +13% on their 93‑task benchmark

  • CodeRabbit: Recall improved >10% on the hardest bugs

  • XBOW (autonomous pentest): Visual acuity jumped 54.5% → 98.5% — "single biggest Opus pain point effectively disappeared"

  • Harvey (legal AI): 90.9% on BigLaw Bench at high effort

  • Rakuten: 3x more production tasks resolved than Opus 4.6

  • Cursor: Launched with 50% off promotional pricing, describing 4.7 as "impressively autonomous and more creative in its reasoning"

If you run heavy agentic coding workflows — long‑horizon refactors, multi‑repo work, autonomous testing — Opus 4.7 is genuinely better than 4.6. If you run simple chats, document Q&A, or long‑context retrieval, the upgrade is neutral or negative.

GitHub Copilot and AWS Bedrock Rollout

GitHub Copilot: Opus 4.7 is generally available to Pro+, Business, and Enterprise users. A promotional 7.5x premium request multiplier applies through April 30 — after that, pricing likely normalizes to 1–3x. Opus 4.7 replaces both 4.5 and 4.6 in the Copilot model picker. Enterprise admins must explicitly enable the Opus 4.7 policy.

Amazon Bedrock: Available in select AWS regions on the next‑gen inference engine. Enterprise features include zero operator data access ("customer prompts and responses are never visible to Anthropic or AWS operators") and dynamic traffic routing. AWS highlights three primary use cases: agentic coding, professional work (slides/docs/finance/data viz), and long‑running tasks.

Anthropic's Busy Week

Opus 4.7 didn't launch in isolation. It capped a 10‑day sprint at Anthropic:

  • April 7: Project Glasswing announced — Claude Mythos Preview goes to a restricted consortium (AWS, Apple, Google, JPMorgan, Microsoft, NVIDIA + 40 orgs) with $100M in credits

  • April 14: OpenAI counter‑launches GPT‑5.4‑Cyber via their Trusted Access program; Anthropic ships Claude Routines and a Claude Code desktop redesign

  • April 16: Opus 4.7 GA + Full‑Stack AI Studio (design tool rivaling Figma/Adobe) + Claude in MS Word/PowerPoint beta

Reported Anthropic valuation: $800B (2x February's $380B). Annualized revenue reportedly jumped from $9B to $30B. Release cadence: major updates every ~2 weeks since January.

How It Compares to GPT‑5.4 and Gemini 3.1 Pro

At $5 / $25 per million tokens, Opus 4.7 is positioned as a premium option. The competitive landscape:

Model

Pricing (per M)

Best At

Weakest At

Opus 4.7

$5 / $25

SWE‑bench Pro, CursorBench, vision, legal (BigLaw)

Long‑context MRCR, USAMO math, Terminal‑Bench

GPT‑5.4 / GPT‑5.4‑Cyber

Similar tier

Terminal‑Bench, BrowseComp, math, cybersecurity

Vision, some agentic coding

Gemini 3.1 Pro

$2 / $12 (base)

Price/performance, GPQA, ARC‑AGI‑2

Coding specialization

Claude Mythos Preview

$25 / $125

Frontier capability (restricted access)

Not publicly available

Gemini 3.1 Pro at $2 / $12 is 2.5x cheaper than Opus 4.7 for comparable general reasoning. Anthropic is betting quality‑per‑task beats raw cost‑per‑token — backed by production partner claims that Opus 4.7 makes 1/3 as many tool errors as 4.6 and completes multi‑step workflows in 14% fewer steps.

GPT 5.4 and Gemini 3.1 Pro vs Claud Opus 4.7

Should You Upgrade to Opus 4.7?

Our honest take:

  1. Heavy coding / agentic workflows: Yes, upgrade. The SWE‑bench Pro gains are real, /ultrareview is useful, and the vision improvements matter if you work with screenshots, diagrams, or UIs.

  2. Long‑context document analysis (>500K tokens): Stay on 4.6 if you can. The MRCR regression is real for this use case.

  3. Budget‑sensitive production workflows: Consider Gemini 3.1 Pro at 2.5x cheaper, or stay on Opus 4.6 until the tokenizer overhead has been measured on your specific workload.

  4. Claude Pro ($20) subscribers: Brace for hitting usage limits faster. Anthropic may adjust, but right now the math is rough for casual users.

  5. Claude Max / Team / Enterprise: Upgrade and benefit from Auto mode extension. Most users at these tiers report positive experiences once they adjust prompts to 4.7's stricter instruction‑following.

What This Means for Your Website

As Opus 4.7 enters production deployments — Claude Code, GitHub Copilot, AWS Bedrock, Vertex AI, Foundry — more autonomous AI agents will be browsing the web on behalf of users. Each of these agents is looking for the same thing: structured, machine‑readable content.

The sites that win AI citations in 2026 share three traits: clean server‑rendered HTML, JSON‑LD structured data, and a proper llms.txt file signaling what the site contains. Generate your free llms.txt file now — it takes 60 seconds and makes your site visible to every major AI agent.

Conclusion

Claude Opus 4.7 is neither the disaster the Reddit comments suggest nor the uncomplicated upgrade the Anthropic blog post pitches. It's a coding and vision specialist with real benchmark wins and real regressions, shipped alongside a tokenizer change that effectively raised costs without raising the sticker price.

If you build production coding agents, it's probably the best model available right now. If you do long‑context retrieval, stay on 4.6. If you care about cost per task, evaluate Gemini 3.1 Pro. If you're a casual Claude Pro user, expect shorter usage windows until Anthropic rebalances.

And if you maintain a website that AI agents might visit, none of this changes your priority: make your content AI‑readable. Generate your free llms.txt file →

Frequently Asked Questions

When was Claude Opus 4.7 released?

Anthropic released Claude Opus 4.7 on April 16, 2026. It became available the same day on claude.ai, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.

What's the pricing for Claude Opus 4.7?

Pricing is $5 per million input tokens and $25 per million output tokens — unchanged from Opus 4.6. However, a new tokenizer means the same prompt consumes 1.0–1.35x more tokens than before, effectively raising costs by 10–35% depending on your content type.

Is Opus 4.7 better than GPT‑5.4?

It depends on the task. Opus 4.7 wins on SWE‑bench Pro (64.3% vs 57.7%) and CursorBench. GPT‑5.4 wins on Terminal‑Bench 2.0 (75.1% vs 69.4%), BrowseComp, and math benchmarks like USAMO 2026 (95% vs ~60–70%). For coding and vision tasks, Opus 4.7 is often stronger. For math, research, and web browsing tasks, GPT‑5.4 leads.

What is the MRCR regression everyone is talking about?

MRCR (Multi‑Round Context Retrieval) at 1M tokens dropped from 78.3% on Opus 4.6 to 32.2% on Opus 4.7 — a 46‑point regression. Anthropic says they're phasing out MRCR because it relies on "stacking distractors to trick the model." GraphWalks is now their preferred long‑context benchmark and improved on 4.7. For users doing heavy long‑context retrieval, 4.6 may still perform better.

What is the new xhigh effort level?

xhigh sits between "high" and "max" in Claude Code's effort setting. It's the new default for Opus 4.7 and produces stronger reasoning at the cost of more output tokens. You can set it with /effort xhigh, claude --effort xhigh, or the CLAUDE_CODE_EFFORT_LEVEL=xhigh environment variable. Benchmark data shows xhigh at 100K tokens outperforming Opus 4.6's max effort at 200K tokens.

Why are developers saying Opus 4.7 is "just nerfed 4.6"?

The community theory is that Anthropic intentionally degraded Opus 4.6 performance in the weeks before launch to make 4.7 appear more impressive. This is unconfirmed speculation, but it's the dominant narrative in the r/ClaudeAI community. What's measurable: the new tokenizer increases token consumption by up to 35%, the manual thinking control has been removed, and some users report real 4.6‑vs‑4.7 comparisons where 4.6 actually wins on specific tasks.

What is /ultrareview in Claude Code?

/ultrareview is a new Claude Code slash command launched with Opus 4.7. It starts a dedicated review session that reads through code changes and flags issues across architecture, security, performance, and maintainability. Pro and Max users get 3 free trials. It's more thorough than standard code review responses because it runs in a separate focused session.

Is Opus 4.7 available on GitHub Copilot?

Yes — Opus 4.7 is generally available on GitHub Copilot for Pro+, Business, and Enterprise plans. A promotional 7.5x premium request multiplier applies through April 30, 2026. Opus 4.7 replaces both Opus 4.5 and 4.6 in the Copilot model picker. Enterprise admins must enable the Opus 4.7 policy in Copilot settings before users can access it.

Why does Claude Opus 4.7 flag safe code as "malware"?

Multiple developers report Opus 4.7 in Claude Code refusing to edit benign code — React components, CSS files, marketing pages — classifying them as "malware." The issue traces to a system prompt instruction requiring Claude to analyze every file for malware and refuse modifications if any suspicion exists. This is breaking real workflows and is the most‑reported bug in the launch window. Expect Anthropic to tune this in the coming days.

Should I upgrade to Claude Opus 4.7 or stay on 4.6?

Upgrade if: you do heavy agentic coding, need better vision/OCR, work with screenshots or diagrams, or use Claude Code's new /ultrareview. Stay on 4.6 if: you do long‑context retrieval (>500K tokens), you're a cost‑sensitive Pro user who hits usage limits, or your workflows depend on the manual Extended Thinking toggle that was removed in 4.7. For enterprise use on Bedrock or GitHub Copilot, most teams benefit from upgrading but should audit token costs after the tokenizer change.

Filed under
Claude Opus 4.7
Anthropic
Claude Code
AI models
SWE-bench
GitHub Copilot
Amazon Bedrock
GPT-5.4
AI benchmarks
2026

Ready to optimize your website for AI?

Generate your llms.txt file for free in seconds.

Try the Generator