Grok Comes to Amazon Bedrock: What It Means for Enterprise AI
If your AI roadmap runs through AWS, you’ve got a new option to weigh. xAI’s Grok 4.3 went generally available on Amazon Bedrock on June 15, 2026, making xAI the third major independent AI lab on the platform alongside Anthropic and OpenAI (AWS announcement, June 15, 2026). The headline price — $1.25 per million input tokens and $2.50 per million output tokens — is the cheapest a US-lab frontier reasoning model goes on Bedrock right now (AWS Bedrock pricing page).
That’s the elevator pitch. The full picture has more nuance than the launch tweet. I’ve spent the last few days digging through the docs, the benchmarks, and the pricing tables, and I want to walk you through what actually shipped, what it costs in real workloads, and where the gotchas live. If you’re an enterprise AI buyer on AWS, this is the breakdown I wish I’d had on Monday morning.
What Just Landed on June 15, 2026
Grok 4.3 is now live on Bedrock in three US regions — Oregon (us-west-2), N. Virginia (us-east-1), and Ohio (us-east-2) — as In-Region inference only. No Geo Cross-Region, no Global Cross-Region, no multi-region failover for the launch (AWS Grok 4.3 model card).
The model itself wasn’t new on launch day. xAI started beta testing Grok 4.3 on April 17, 2026, and flipped it to the API default on April 30 (VentureBeat, May 1, 2026). Bedrock is the distribution layer, not a new model release. What you’re getting:
- Model ID:
xai.grok-4.3 - Context window: 1 million tokens
- Max output: 30,000 tokens per request
- Input modalities: Text and image; output is text only
- Reasoning: Always-on by default, configurable via
reasoning.effort(none, low, medium, high) - Service tiers: Standard, Priority, Flex — Reserved is not supported
A quick note on that “always-on” reasoning. Unlike Claude or Nova, where you toggle thinking on or off, Grok 4.3 thinks before every response. You can suppress reasoning tokens from the output by setting effort: none, but the internal process still runs. For high-volume pipelines that don’t need deep reasoning, this matters — you’re paying for thinking you may not want.
How Much Does Grok 4.3 Cost on Bedrock?
On-demand pricing is $1.25 per million input tokens and $2.50 per million output tokens, with cached input at $0.20 per million. Confirmed on the AWS Bedrock pricing page (us-west-2 region) and matching the direct xAI API rate.
Here’s how that stacks up against the other Bedrock frontier options an enterprise team is likely to weigh. Pricing is per million tokens, on-demand, US regions:
| Model | Provider | Input | Output | Context | Endpoint |
|---|---|---|---|---|---|
| Grok 4.3 | xAI | $1.25 | $2.50 | 1M | Mantle |
| Amazon Nova Pro | Amazon | $0.80 | $3.20 | 300K | Runtime |
| DeepSeek V3.2 | DeepSeek | $0.62 | $1.85 | 128K | Runtime |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200K | Runtime |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | 200K | Runtime |
| GPT-5.4 (Bedrock) | OpenAI | $2.75 | $16.50 | 128K | Runtime |
(Sources: AWS Bedrock pricing, truefoundry Bedrock pricing analysis, June 12, 2026)
“Grok 4.3 is as smart as Sonnet 4.6 and 5x cheaper and faster.” — Bindu Reddy, CEO of Abacus AI, on X, May 1, 2026.
Three things stand out from that table. First, Grok 4.3 is the only frontier-class reasoning model under $1.50 input on Bedrock. Second, it’s the only one with a million-token context window. Third — and this is the part the marketing glosses over — it’s also the only one running on a non-standard endpoint.
The Three Gotchas Nobody’s Talking About
If you only read the AWS announcement, you’ll assume Grok 4.3 drops into your existing Bedrock code with a model ID change. It does not. Here are the three traps that will bite teams who don’t read the model card carefully.
1. The Mantle Endpoint Breaks Your Standard SDK Code
Grok 4.3 doesn’t run on the bedrock-runtime endpoint that Claude, Titan, and most other Bedrock models use. It runs on Mantle, a new distributed inference engine inside Bedrock that’s OpenAI-compatible (AWS Mantle docs).
What that means in practice:
- The endpoint URL is
https://bedrock-mantle.{region}.api.aws/openai/v1, not the standard Bedrock Runtime URL - Converse API and InvokeModel are not supported
- You need to use an OpenAI-compatible client (the
openaiPython SDK works) and point it at Mantle - Default parameters differ from OpenAI:
temperaturedefaults to 0.7,top_pto 0.95,max_completion_tokensto 131,072
If your platform team has standardized on the Bedrock Converse API, adding Grok 4.3 is a separate integration project, not a config flip. Budget the engineering work before you promise it in a roadmap.
2. The 200K Context Cliff Doubles Your Bill
The 1 million-token context window is real, but pricing doubles for any request over 200,000 total tokens. That’s a context-pricing tier that kicks in silently if you’re not watching.
For long-document workloads — contract review, case-law corpora, full financial filings, multi-file RAG pipelines — the effective cost can land well above the headline rate. A pipeline that routinely assembles 600K-token prompts is not running at $1.25/M input. It’s running at roughly double. Model your costs at the higher tier before you commit a budget.
3. The Vendor Is Mid-Restructuring
xAI merged into SpaceX in February 2026, with plans to dissolve xAI as a separate entity and fold Grok and X into a SpaceX AI division. Nine of xAI’s eleven original co-founders have departed (The Register, May 29, 2026).
The compliance checklist passes — xAI maintains SOC 2 Type II, HIPAA eligibility, and GDPR (VentureBeat, May 1, 2026). But compliance describes the model and the platform, not the organization behind it. A regulated buyer betting on stable API behavior and a multi-year roadmap deserves more diligence than the certifications alone suggest.
How Grok 4.3 Actually Performs
On the Artificial Analysis Intelligence Index, Grok 4.3 scores 53 at high reasoning effort and around 38 at low effort (Artificial Analysis, April 30, 2026). At high effort, that’s a clear jump over the prior Grok 4.20 and places it in competitive territory with Claude Sonnet 4.6. At low effort, it’s middling.
The agentic story is the genuinely strong one:
- GDPval-AA agentic ELO: 1,500 — a 321-point jump over Grok 4.20’s 1,179
- #1 on Vals AI CaseLaw v2 (79.3% accuracy) — legal reasoning
- #1 on Vals AI CorpFin — corporate finance
- Tau2-Bench Telecom: 98% — customer-support tool calling, up 8 points from prior gen
The honest caveat: Grok 4.3 also lost about 8 points on the non-hallucination metric versus Grok 4.20, even as it gained roughly 8 points on factual accuracy (AA-Omniscience). More right answers, but also more confident wrong ones. For finance, healthcare, and legal workflows where a fabricated citation is a liability, that regression is the number to weigh against the marketing language.
On coding, the news is mixed. Independent reviewers at Andon Labs reported Grok 4.3 as a “big regression” on Vending-Bench 2, describing the model as having “narcolepsy problems” in long agentic simulations. It scores only 11% on ProofBench (math). So: strong on agentic tool use in narrow domains, weaker on general coding and complex math.
What This Means for Your AWS Stack
If you’re already running workloads on Bedrock, here’s the practical read.
1. For high-volume agentic and tool-calling pipelines, Grok 4.3 is the price-performance pick on the platform right now. At roughly a third of Claude Sonnet’s input cost and a sixth of its output cost, with competitive agentic scores, it’s a credible default for customer support automation, contract review, and financial document Q&A. Account for the 44% extra output tokens that always-on reasoning emits versus prior generations, and check your real spend before budgeting the headline rate.
2. For long-document RAG with prompts routinely above 200K tokens, model the doubled context tier first. If your pipeline pushes past the cliff on most calls, the headline advantage may disappear against a 200K-window model at higher per-token rates.
3. For Bedrock-standardized teams on the Converse API, scope the Mantle integration as its own workstream. It’s an OpenAI-compatible client path, not a model swap. The migration is tractable but real.
4. For regulated verticals (finance, healthcare, legal), the certifications are in place, but the non-hallucination regression and vendor-in-flux risk warrant extra diligence. Pilot before you commit, and keep a frontier accuracy model in the loop for high-liability outputs.
“The energy drink of frontier models: it’ll keep you up, but you won’t enjoy the experience and you’ll regret it in the morning.” — Corey Quinn, The Register, May 29, 2026.
That’s harsh but worth weighing against the price.
How AWS Sees This (Beyond the Headline)
There’s a structural read here that the launch coverage mostly skips. AWS now hosts all three independent frontier labs on Bedrock: Anthropic (with a $100 billion-plus compute commitment and up to 5 gigawatts of Trainium capacity per the Anthropic-Amazon announcement, April 20, 2026), OpenAI (with an expanded $38 billion agreement plus another $100 billion in commitments), and now xAI.
Bedrock now serves “more than 100,000 organizations worldwide” per AWS’s own marketing page. The pattern across these launches — model shows up on Bedrock shortly after a major compute deal — suggests the catalog is partly a sales funnel for Trainium silicon, not just an enterprise model marketplace.
What that means for you: the model lineup on Bedrock is going to keep expanding. Mantle is positioned to be the on-ramp for OpenAI-compatible third-party models (it launched OpenAI-compatible support in early 2026 per the AWS Mantle announcement, March 2026). If your team standardizes on Mantle for new integrations now, you’ll be ready for whatever ships next.
Should You Actually Switch?
The honest answer: it depends on the workload.
Here’s a quick decision framework:
- Cost-sensitive agentic workloads at scale — Strong candidate. Cheapest US-lab reasoning model on Bedrock, with the strongest agentic benchmark jumps of any 2026 release I’ve tracked.
- Long-document RAG above 200K tokens — Run the cliff math first. If your prompts routinely exceed 200K, the effective cost may erase the headline advantage.
- Regulated finance/healthcare/legal — Pilot with guardrails. Certifications pass, but the hallucination regression and vendor restructuring are real risk factors.
- Bedrock-standardized teams on Converse/InvokeModel — Scope the Mantle integration work before promising it. It’s a separate code path, not a model ID swap.
For most teams, the right sequence is the same: shortlist Grok 4.3 on price, prototype against Mantle on your own prompts, measure real spend with the 200K cliff factored in, and run your highest-liability prompts through an accuracy check before trusting it in production.
The Bottom Line
Grok on Amazon Bedrock is a real event, not just a catalog line. xAI is the third independent frontier lab on the platform, and at $1.25 / $2.50 per million tokens it’s the cheapest US-lab frontier reasoning model there. For high-volume agentic and tool-use workloads, that price-performance profile is genuinely compelling — the agentic benchmark jump over the prior generation is the strongest part of the story.
The fine print is where teams will win or lose money. The 1M window doubles in price above 200K tokens. The model lives on the Mantle endpoint, so it’s a separate integration for anyone standardized on Bedrock’s own SDK. And the accuracy story is mixed: more correct answers, but a measured regression on hallucination that matters most in exactly the regulated verticals AWS is targeting.
Frontier capability is no longer the scarce thing. Distribution, price discipline, and organizational stability are. Grok 4.3 brings the first two convincingly. The third — with a vendor mid-restructuring and a founder exodus — is the open question. Don’t decide off a headline. Run your own eval on the prompts you care about, with the cliff, the endpoint, and the accuracy trade-off all priced in.