Grok Imagine Video 1.5: xAI’s Faster AI Video Generator Is Here

Grok Imagine Video 1.5 is xAI’s new image-to-video model, and it’s the first time an xAI video product has genuinely scared the competition. It shipped as a preview on May 30, 2026, went generally available on June 16, 2026, and immediately claimed the #1 spot on the independent Image-to-Video Arena leaderboard with a +52 Elo point jump over version 1.0 (TechTimes, June 18, 2026). For the first time in a while, xAI is shipping a video AI tool that isn’t just “fast and cheap” — it’s also winning on quality.

I spent the past week poking at it, watching creator tests, and pulling the actual price sheet from xAI’s docs. Here’s the honest breakdown of what Grok Imagine Video 1.5 does, what it doesn’t, and how it stacks up against Sora 2, Google Veo 3.1, and Runway Gen-4 in June 2026.

What Is Grok Imagine Video 1.5?

Grok Imagine Video 1.5 is xAI’s image-to-video and text-to-video model, built on the Aurora autoregressive engine. You give it a still image and a prompt, and it returns a short clip with synchronized audio generated in the same pass — no separate sound step.

The model is technically separate from the Grok chatbot. They share branding but serve very different jobs, and judging the video model on the chatbot’s reasoning scores is a category error worth skipping.

Here’s the spec sheet, taken straight from the xAI model documentation and the June 18, 2026 TechTimes coverage:

Output resolution: 480p or 720p (no 1080p, no 4K)
Frame rate: 24 fps
Clip length: 6 to 15 seconds
Aspect ratios: 7 supported, including 16:9, 9:16, and 1:1
Audio: native, generated in the same pass as the video
Engine: Aurora autoregressive, trained on roughly 110,000 NVIDIA GB200 GPUs

Pull quote: “Grok Imagine Video 1.5 produces 6-second, 720p videos in about 25 seconds — down from 40+ seconds in the previous model.” — xAI, Grok Imagine Video 1.5 announcement

That last number is the headline. Speed is the single biggest reason this version matters.

What’s Actually New in 1.5 vs 1.0?

The 1.5 upgrade focused on three pain points from 1.0: jittery motion, weak audio, and slow generation. The first version of Grok Imagine Video shipped in February 2026 with a lot of hype and a lot of rough edges. Version 1.5 is where it actually works.

The upgrade hit list looks like this:

Better motion and physics. Camera pans and dolly moves that stuttered in 1.0 now read as directed shots, not procedural animations. The arena testers agreed — that’s where most of the +52 Elo points came from.
Native audio is real now. Dialogue has natural pacing. Ambient sound shifts when a subject moves across the frame. Sound effects sync to on-screen action without you prompting for them. Independent testers at Alici.AI verified a horse-bucking clip with synced hooves, a chewing giraffe with timed jaw audio, and a tennis racket “whoosh” landing on contact.
Faster generation. The new Fast variant produces a 6-second, 720p clip in roughly 25 seconds. The previous model took 40+ seconds on the same settings. For a creator iterating on prompts, that’s the difference between testing five concepts in an hour and testing one.
Cleaner video extension. Chaining clips via “Extend from Frame” used to degrade visibly after two joins. 1.5 holds composition, lighting, and character across longer chains — useful for any sequence longer than 15 seconds.

For a working creator, the difference feels like swapping a beta for a real product.

How Much Does Grok Imagine Video 1.5 Cost?

API pricing is $0.08 per second at 480p and $0.14 per second at 720p, with a $0.01 per-image input fee. That works out to $4.20 per minute at 720p, billed by the second.

I pulled these numbers directly from the xAI pricing page on June 22, 2026, and they match the breakdown in ImagineArt’s pricing guide from June 4, 2026. Audio is included — there’s no extra charge for the synchronized sound layer.

Three ways to access it:

Free tier at grok.com/imagine — 5 generations per day, enough to test the model before paying anything.
SuperGrok at $30/month — higher daily limits at 720p.
xAI API — per-second billing, rate limit 60 requests per minute, available in us-east-1, eu-west-1, and us-west-2.

For developers, the API key path is the cleanest. The grok-imagine-video-1.5-preview model string is what you’ll pass to the xAI SDK.

Grok Imagine Video 1.5 vs Sora 2, Veo 3.1, and Runway Gen-4

Grok Imagine Video 1.5 wins on speed and price, holds its own on image-to-video quality, and loses on resolution. That’s the honest summary. Here’s how the four major models compare in June 2026:

Model	Max Resolution	Max Clip Length	Native Audio	API Price (per minute, top tier)	I2V Arena Standing
Grok Imagine Video 1.5	720p	15s	Yes (synced SFX)	$4.20 / min	#1 (Elo 1467)
OpenAI Sora 2	1080p	12s	Yes (dialogue + foley)	~$30 / min (Sora 2 Pro, deprecated)	Surpassed by Grok 1.5
Google Veo 3.1	1080p (4K on Pro)	8s (chains to 140s)	Yes (48 kHz)	$9 Fast / $24 Quality	#6–8 (Elo ~1385–1398)
Runway Gen-4	1080p	10s	No native audio	~$12 / min (Turbo)	#40 (Elo 1051)

Sources: Arena leaderboard, June 10, 2026, xAI pricing docs, TechTimes June 18, 2026, WaveSpeed AI comparison, January 31, 2026.

A few things stand out:

Grok is roughly 86% cheaper than Sora 2 Pro at the comparable tier. That’s the headline number every cost-comparison article has been quoting since the GA launch.
Veo 3.1 wins on resolution and audio fidelity if you need 4K or 48 kHz sound. Grok tops out at 720p and uses standard audio.
Runway Gen-4 is still the editor’s choice for in-browser timeline work and multi-reference character consistency. Grok doesn’t try to compete there.
OpenAI killed the Sora consumer app on April 26, 2026, and the Sora 2 API sunsets on September 24, 2026. So “Sora vs Grok” is increasingly a “Grok vs the vacuum Sora left behind” question.

If you want a deeper side-by-side, the WaveSpeed AI 2026 comparison walks through Sora 2, Veo 3.1, Seedance 1.5 Pro, WAN 2.5/2.6, and Vidu Q3 alongside Grok. It’s the most thorough third-party breakdown I found.

Why Does Grok Imagine Video 1.5 Cap at 720p?

The 720p ceiling is an architectural consequence of Aurora’s autoregressive design. Unlike Sora or Runway, which generate video by iteratively denoising all frames in parallel, Aurora generates each frame sequentially, with every frame conditioned on all the frames that came before it.

That sequential design is exactly what makes the camera movements stable and the subject positions hold across a clip. It’s also what causes the resolution wall. Scaling to 1080p would multiply the per-frame token count by roughly 2.25 times, and in a sequential architecture each additional token extends the generation chain — not a parallel operation.

xAI says a higher-resolution tier is on the roadmap but hasn’t committed to a date (TechTimes, June 18, 2026). For social content and concept work, 720p is rarely a barrier. For broadcast or large-format delivery, it’s the one constraint that matters.

What Are the Real Limits of Grok Imagine Video 1.5?

The model is fast and cheap, but it has hard limits on resolution, clip length, and audio consistency. Knowing these upfront saves you credits.

Here are the boundaries I’d plan around:

720p ceiling. No 1080p, no 4K. If your deliverable needs full HD, route to Veo 3.1 or Seedance 2.
15-second max per clip. Longer sequences require “Extend from Frame” chaining. The first two or three extensions hold up; quality drifts after that.
24 fps fixed. Cinematic standard, but it rules out 60 fps workflows for gaming or sports.
Native audio is real but inconsistent. In the Alici.AI hands-on tests, three of five image-to-video clips produced synced sound effects. One came back music-only with no diegetic audio at all. Plan a fallback pass for anything client-facing.
Faces soften under fast motion. A common tell across all current AI video models — Grok 1.5 isn’t worse than the field, but it’s not better either.

The honest framing: Grok 1.5 is the best drafting model in AI video right now. It’s not the best delivery model. Use it to test concepts, then graduate the winning shots to Veo or Runway for final render.

Who Should Actually Use Grok Imagine Video 1.5?

Grok Imagine Video 1.5 is built for high-volume short-form creators, marketing teams iterating on concepts, and developers who need cheap API access at quality. Here’s how I’d route the decision:

Social media creators (TikTok, Reels, Shorts). 720p is fine for mobile. Native audio removes a post-production step. The 6–15 second window fits platform norms.
Ad concept testing. Generate 20 variations of a product hero shot in an afternoon. Pick the strongest two. Move to Veo or Runway for final.
Stylized and surreal art. Grok holds non-photoreal source images well — anime, illustrations, concept frames.
Developers building video pipelines. $4.20 per minute at 720p is the cheapest quality-tier option in the market right now.
Agencies publishing daily. The combination of native audio, fast generation, and low cost removes a production step per clip.

It is not the right tool for:

1080p+ brand films or broadcast deliverables (use Veo 3.1)
Sustained narrative over 60 seconds (use Veo’s chained extension)
Precise dance or motion-control work (use Kling 3.0)
Pixel-accurate product detail across every frame (use Seedance 2.0)

How to Access Grok Imagine Video 1.5

You can try Grok Imagine Video 1.5 in four places, depending on whether you want to test, create casually, or build production pipelines.

grok.com/imagine — free tier, 5 generations per day, no subscription required.
SuperGrok at $30/month — higher daily limits at 720p, plus access to the rest of the Grok stack.
xAI API — per-second billing, rate limit 60 requests per minute, docs here.
ImagineArt, Alici.AI, Kie.ai, Morphic, WaveSpeed — third-party platforms that wrap the API with editing tools, free trial credits, or unified prompt boxes.

If you’re new to AI video, the free tier at grok.com is the lowest-friction starting point. If you’re shipping product, the xAI API is the cleanest path. If you want to test it alongside Sora, Veo, and Seedance without juggling five accounts, Alici.AI runs Grok 1.5 next to every other major model behind one prompt box, with free credits to start.

The Honest Verdict

Grok Imagine Video 1.5 is the first AI video model from xAI that doesn’t feel like a demo. It’s fast, it’s cheap, it tops the blind-test leaderboard, and the native audio actually works most of the time. The 720p ceiling is real, but for the 90% of AI video work that lives on social, web, and concept-testing, it’s a non-issue.

If you’ve been waiting to try Grok video but got burned by 1.0’s jittery output, 1.5 is worth a fresh look. The free 5-generations-per-day tier is enough to feel the difference in under an hour.

For a deeper technical breakdown, the The Decoder article from June 4, 2026 covers the preview launch, and the xAI Imagine Video 1.5 announcement page has the official specs and Fast variant details.

The AI video space moved fast in the first half of 2026 — OpenAI shut down Sora, Google raised Veo prices, and xAI jumped from also-ran to category leader. If that pace holds, the next six months will be interesting.

Frequently Asked Questions

Is Grok Imagine Video 1.5 free? Yes — the free tier at grok.com/imagine gives you 5 generations per day with no subscription. SuperGrok at $30/month lifts the daily cap. The xAI API is pay-per-second.

How does Grok Imagine Video 1.5 compare to Sora 2? Grok is faster, cheaper (roughly 86% less at comparable settings), and currently ranks higher on the Image-to-Video Arena. Sora 2 Pro still wins on physics accuracy and supports 1080p, but OpenAI discontinued the Sora consumer app on April 26, 2026, and the Sora 2 API sunsets September 24, 2026.

What resolution does Grok Imagine Video 1.5 output? Up to 720p at 24 fps. 480p is also available for cheaper drafts. There is no 1080p or 4K option.

Does Grok Imagine Video 1.5 generate audio? Yes. Audio is generated natively in the same pass as the video. Independent testers verified synced sound effects, ambient shifts, and lip-synced dialogue in most outputs, though not every clip includes diegetic sound.

When did Grok Imagine Video 1.5 launch? Preview shipped May 30, 2026 via the xAI API. General availability followed on June 16, 2026 across the Imagine API, grok.com, and iOS/Android apps.

What is the Aurora engine? Aurora is xAI’s autoregressive mixture-of-experts video generation engine, trained on roughly 110,000 NVIDIA GB200 GPUs. Unlike diffusion models, Aurora generates each frame sequentially, conditioning every new frame on all prior frames — which is what gives Grok its stable camera motion.

Grok Imagine Video 1.5: xAI's Faster AI Video Generator Is Here

Key Takeaways

Summarize with AI