Anthropic released Claude Opus 4.8 on May 28, 2026 — 41 days after Opus 4.7 shipped. That cadence alone tells you something: this is not a generational leap, but it is not a patch either. It is a focused set of improvements to the parts of the model that matter most in agentic and long-horizon coding work, bundled with two new platform features that change what is possible in Claude Code.
If you are building with the Claude API, running agents in Claude Code, or just using Opus as your daily driver, here is what is worth understanding — benchmark numbers included, caveats noted.
The Benchmark Numbers That Matter
Anthropic’s headline coding metric is SWE-bench Pro, which tests real-world GitHub issue resolution on a harder, less saturated version of the original SWE-bench. Opus 4.8 scores 69.2%, up from 64.3% for Opus 4.7. That 4.9-point gain is the biggest single-cycle jump the Opus line has had on this benchmark, and it now sits 10.6 points ahead of GPT-5.5 on the same test.
On SWE-bench Verified — the original, community-verified variant — Opus 4.8 scores 88.6%, up from 87.6%. The ceiling on this benchmark is getting crowded, so the marginal improvement here is less informative than SWE-bench Pro where there is more room to differentiate.
The more striking number is USAMO 2026, a competition mathematics benchmark. Opus 4.8 scores 96.7% — up from 69.3% for Opus 4.7. That is a 27-point jump in a single release cycle. For developers, competition math scores are a proxy for structured reasoning quality: the kind of thinking that shows up in algorithm design, constraint satisfaction, and complex debugging. A model that can solve olympiad problems more reliably tends to write cleaner logic under ambiguous requirements.
Multidisciplinary reasoning with tools moved from 54.7% to 57.9%. Terminal-Bench 2.1 — which tests autonomous terminal task completion — came in at 74.6%, behind GPT-5.5’s 78.2% on the same harness. Opus 4.8 is not the strongest model on every single benchmark, but across the coding and reasoning suite it is at or near the top.
The SWE-bench Pro jump from 64.3% to 69.2% is the most meaningful coding benchmark improvement Opus has delivered in a single cycle. For agentic coding, that delta is the one to track.
Dynamic Workflows: The Feature That Changes Scale
The biggest structural addition to Claude Code in this release is dynamic workflows, currently in research preview (requires Claude Code v2.1.154 or later). It fundamentally changes how Claude handles tasks that are too large or too branching to fit inside a single sequential session.
Here is how it works. When you describe a large task to Claude Code, it writes a JavaScript orchestration script for the job rather than executing it step by step in your active session. A runtime engine then executes that script in the background, spinning up parallel subagents — up to 16 running concurrently, and up to 1,000 total per run. The subagents work independently, Claude verifies the combined output, and the final result surfaces in your session.
Critically, the plan lives in the script’s variables, not in your context window. Only the final answer returns to your session. This is the detail that makes large-scale work practical: a sprawling refactor across dozens of files, a wide test matrix, or an exploration of several competing architectural paths can all run without your context budget collapsing under the coordination overhead.
Dynamic workflows are available on Max, Team, and Enterprise plans and are on by default for Max and Team users. If you are on the Claude API, you can grant an orchestrator permission to launch multi-agent workflows mid-conversation using the new system-entry-in-messages feature described below.
A task that would have required you to manually coordinate multiple Claude Code sessions can now be described once and executed in parallel. The orchestration is Claude’s problem, not yours.
The Effort Parameter: Controlling How Hard the Model Thinks
Opus 4.8 ships with a formal effort parameter on the API. It defaults to high and controls how many tokens the model spends across a response, including tool calls. The available levels are high, xhigh, and max.
For most coding tasks, high is the right default. For long-horizon agentic runs — sessions that stretch past 30 minutes with token budgets in the millions — Anthropic recommends xhigh. At this level, the model reasons more deeply and more frequently, trading tokens for coherence across a very long task. max is the ceiling, reserved for the most demanding analytical work.
The practical implication is that effort levels give you a cost dial that did not exist before. A high-effort pass over a complex codebase costs more than a standard pass. But the alternative — running standard effort and getting a result that requires a second pass — often costs more in total. The new levels let you match the model’s reasoning budget to the actual complexity of the task, rather than accepting one-size-fits-all behaviour.
Dynamic workflows in Claude Code pair xhigh effort with standing permission to launch multi-agent workflows. Anthropic calls this combination “Ultracode” in some of their tooling documentation, but it is not a separate product — it is just those two settings together.
Fast Mode: 2.5x Speed at 3x Lower Cost Than Opus 4.7
Fast mode is available for Opus 4.8 as a research preview on the Claude API. Set speed: "fast" to get up to 2.5x higher output tokens per second from the same model.
The pricing is $10 per million input tokens and $50 per million output tokens. That sounds like a premium until you compare it to what fast mode cost on Opus 4.7: $30 input and $150 output per million tokens. Fast mode on Opus 4.8 is three times cheaper than it was on the previous model, which makes it viable for production use cases where latency matters and you were previously priced out.
Standard pricing remains unchanged at $5 per million input tokens and $25 per million output tokens. Fast mode pricing stacks with prompt caching multipliers, so if your prompts qualify for caching, you get the speed boost on discounted input reads.
API Changes Worth Knowing About
Two API-level changes shipped with Opus 4.8 that are easy to overlook but have real impact on agentic loop design.
The first is mid-conversation system entries. The Messages API now accepts role: "system" entries inside the messages array, not just at the top. This means you can append updated instructions partway through a long-running conversation without restating the full system prompt. The earlier turns — and their prompt cache entries — remain intact. For agentic loops that run for many turns, this eliminates the choice between maintaining cache hits and updating the model’s standing instructions.
The second is a lower minimum cacheable prompt length. Opus 4.8 drops the threshold to 1,024 tokens, down from the higher limit on Opus 4.7. Prompts that were previously too short to hit the cache can now create cache entries with no code changes on your end. Combined with the 90% discount on cache reads, this is a meaningful cost reduction for short-system-prompt agentic loops that were previously below the cache threshold.
If your agent loop uses a short system prompt and many tool-call turns, the lower caching threshold alone may reduce your per-run cost without any changes to your code.
Code Quality and Honesty: The Developer-Facing Alignment Work
Anthropic has been unusually specific about one improvement in Opus 4.8: the model is approximately four times less likely than Opus 4.7 to let a code bug pass without flagging it. It is also more honest about what it does not know and more willing to proactively surface issues with inputs and outputs rather than completing a task silently on shaky foundations.
To be precise about the source here: the four-times figure comes from Anthropic’s internal evaluations, not from an independently reproduced study. Take it as directionally useful, not as a hard measurement. The qualitative signal from early developer testing is consistent with the claim — multiple practitioners have noted that Opus 4.8 proactively flags assumptions and edge cases more often than its predecessor — but the magnitude is Anthropic’s own number.
What Anthropic can also claim: Opus 4.8 fixed the two most common developer complaints about Opus 4.7 — comment verbosity and inconsistent tool calling. Devin’s CEO noted publicly that Opus 4.8 specifically addresses both issues. If you evaluated Opus 4.7 and found either problem disqualifying, this is worth a second look.
Context Window, Max Output, and GitHub Copilot
Opus 4.8 keeps the 1M token context window that Opus 4.7 introduced — with the same caveat that Microsoft Foundry caps it at 200k. Max output is 128k tokens synchronously, with up to 300k tokens available via the Message Batches API using the output-300k-2026-03-24 beta header.
Adaptive thinking is available on Opus 4.8; extended thinking is not. Knowledge cutoff is January 2026.
For teams using GitHub Copilot, Opus 4.8 is selectable on day one for Copilot Pro+, Business, and Enterprise users. It launched with a 15x premium request multiplier prior to Copilot’s usage-based billing transition on June 1, 2026.
Should You Migrate from Opus 4.7?
If you are using Opus 4.7 today and your work involves agentic coding, long-horizon tasks, or tool-intensive loops, yes — the combination of better SWE-bench scores, fixed tool-calling behaviour, the lower caching threshold, and mid-conversation system entries makes the migration straightforward and worth doing. Pricing is identical, so there is no cost argument against it.
If you hit Opus 4.7’s comment verbosity or tool-calling issues and wrote it off, the developer feedback on 4.8 suggests those problems are materially addressed. The 41-day release cadence was Anthropic shipping the model Opus 4.7 should have been.
If you are on Opus 4.6 or older, the migration case is stronger still. Opus 4.6 will remain available but Anthropic’s documentation now points migration guidance at Opus 4.8 directly.
The model ID is claude-opus-4-8. For most production agentic workloads, switching from Opus 4.7 is a one-line change with no downside.
What This Release Signals
The 41-day release cadence is the story underneath the features. Anthropic is shipping refinements on a pace that would have been considered aggressive even a year ago. Opus 4.8 is not a restructuring of what the model can do — it is a tightening of what the model does reliably, with targeted platform additions that address the specific friction points developers hit in long-running agent work.
Dynamic workflows in Claude Code are the most consequential new primitive here. They shift the model’s practical scope from “what can one agent do sequentially” to “what can a coordinated fleet do in parallel.” That boundary is moving, and it is moving faster than most developers have calibrated for.
The right posture is not to wait for a generational release. The improvements in 4.8 are incremental but they compound — better reasoning, cheaper fast inference, fixed reliability issues, and new orchestration primitives. The developers who get ahead of those primitives now are the ones who will have working patterns before they become table stakes.