Cohere North Mini Code: New Open-Source AI Coding Model for Developers

Cohere just dropped North Mini Code, its first open-weights coding model — and the first one explicitly aimed at developers instead of enterprise IT buyers. It’s a 30B-parameter Mixture-of-Experts model with only 3B parameters active per token, released under Apache 2.0 on June 9, 2026, and built from the ground up for agentic software engineering.

If you’ve been waiting for an open coding model that you can actually self-host on a single H100 without selling a kidney, this is the one. Here’s everything I found while digging through the model card, the Cohere blog, Artificial Analysis, and three different newsrooms.

What is Cohere North Mini Code?

North Mini Code is Cohere’s first agentic coding model and its first open-weights developer model, released on June 9, 2026. It’s a sparse Mixture-of-Experts (MoE) transformer with 30 billion total parameters and 3 billion active per token — small enough to run locally, big enough to score competitively on real coding benchmarks. (cohere.com, Hugging Face model card)

I think of MoE models like a switchboard: 128 expert sub-networks sit inside the model, and for every token the router picks 8 of them to actually run. The “30B” headline number is the full phone book; the “3B active” is who’s actually answering the call. That ratio matters because it determines VRAM and inference cost — and Cohere designed this one to run on a single Nvidia H100 at FP8 or FP4 (Cohere docs).

A few things make this release unusual:

It’s the first model in Cohere’s “North” family aimed at developers. Previous Command models targeted enterprise search, RAG, and agents. North Mini Code is the debut of Cohere’s developer stack.
It’s the first Cohere open-source model shipped under Apache 2.0 from day one. Command A+ only moved to Apache in May 2026 (cohere.com).
It’s been post-trained with tool use and reinforcement learning with verifiable rewards (RLVR) on real coding tasks — not just a base model retrofitted with a chat template (Hugging Face).

Why did Cohere release this now?

Cohere released North Mini Code to extend its “sovereign AI” pitch — long sold to banks and governments — to individual developers who want the same control over their coding tools. That’s the framing Cohere co-founder Nick Frosst gave The New Stack on June 15, 2026 (thenewstack.io).

“We’re now hearing similar concerns from developers. They’re starting to think of model access as infrastructure, and infrastructure should be something you own and control. That is an extension of sovereignty.” — Nick Frosst, Cohere co-founder

The timing isn’t random. AI agent platform Lindy publicly moved 100% of its inference traffic from Anthropic to DeepSeek in early June 2026, citing cost — inference bills had grown larger than payroll (thenewstack.io). Cohere is betting a chunk of the developer world is about to make the same kind of switch, and wants to be the open-weights option that isn’t Chinese-built.

It also lands less than a month after Command A+ (May 20, 2026) and roughly two weeks before Anthropic’s Claude Fable 5 hit #1 on the Artificial Analysis Intelligence Index on June 10, 2026 (artificialanalysis.ai). The small-models-vs-frontier-models split is now the defining pattern of 2026 — and Cohere is planting its flag firmly on the small side.

North Mini Code benchmarks: how good is it, really?

North Mini Code scores 33.4 on the Artificial Analysis Coding Index and 27.6 on the broader Artificial Analysis Intelligence Index, beating comparably sized open models like Qwen3, Gemma 4, and Devstral Small 2. Those numbers come from third-party evaluations published the same day as the launch (artificialanalysis.ai).

Here’s the comparison I kept coming back to. Cohere, Artificial Analysis, and Ollama all publish slightly different framings — I’ve reconciled them below:

Benchmark comparison: North Mini Code vs other open coding models

Model	Total Params	Active Params	AA Coding Index	SWE-Bench Verified	SWE-Bench Pro	Terminal-Bench v2
North Mini Code (Cohere)	30B	3B	33.4	67.6	40.2	36.0
Qwen3.6 35B A3B (Alibaba)	35B	3B	35.2	Higher (per Cohere chart)	—	—
Devstral Small 2 (Mistral)	24B	—	Lower	—	—	—
Gemma 4 26B (Google)	26B	4B	Lower	—	—	—
gpt-oss-20B (high)	20B	—	—	24.5 (AA Intelligence, not Coding)	—	—
Mistral Small 4	119B	6.5B	27.8 (AA Intelligence)	—	—	—

Sources: Artificial Analysis, June 9, 2026; Cohere blog, June 9, 2026; Hugging Face eval results.

Pull quote: “North Mini Code achieves a 33.4 on the Artificial Analysis Coding Index — a competitive position among similarly sized models” — Cohere blog, June 9, 2026.

Two caveats worth knowing. First, on non-coding agentic tasks North Mini Code is weaker: it scores 14% on GDPval-AA and 37% on τ²-Bench Telecom (artificialanalysis.ai). It’s a coding specialist, not a general agent. Second, Cohere’s own chart shows Qwen 3.6 ahead on SWE-Bench Verified and LiveCodeBench v6 — so the “competitive” framing is honest, not boastful.

The terminal-task story is more one-sided. Cohere’s internal tests show North Mini Code hitting up to 2.8x higher output throughput than Devstral Small 2 under identical hardware, plus a 30% advantage in inter-token latency (cohere.com). On the Cohere API it runs at roughly 199 output tokens per second (artificialanalysis.ai).

How is North Mini Code licensed?

North Mini Code is released under the Apache 2.0 license — the most permissive standard open-source license — with the addition of Cohere’s Acceptable Use Policy. That gives you commercial use, redistribution, modification, and private deployment rights out of the box (Ollama library; Hugging Face).

The full weights are published in three formats:

BF16 — the full 60GB reference model.
FP8 — quantized to roughly half the size for H100 GPUs.
W4A16 — 4-bit weights, 16-bit activations for tighter hardware.

You can grab them directly from CohereLabs on Hugging Face. As of mid-June 2026 the model had over 21,000 downloads in its first weeks and more than 470 likes on Hugging Face, putting it among the faster-rising open coding releases of the quarter (huggingface.co).

How do you actually deploy it?

You can run North Mini Code locally through Ollama, vLLM, SGLang, or Docker Model Runner — or hit it through the Cohere API, Cohere’s Model Vault, or OpenRouter. Cohere specifically trained it for compatibility with OpenCode, the open-source coding agent (cohere.com).

Here are the deployment paths I’d actually try, in order of how fast you can get going:

Quick local test (5 minutes) — install Ollama and run ollama run north-mini-code-1.0. The q4_K_M quant is 19GB (ollama.com).
Full-speed local server (15 minutes) — vllm serve CohereLabs/North-Mini-Code-1.0 -tp 2 --max-model-len 320000 --tool-call-parser cohere_command4 --reasoning-parser cohere_command4 --enable-auto-tool-choice. You’ll need vLLM main and cohere_melody>=0.9.0 (huggingface.co).
Plug into OpenCode as your coding agent — point OpenCode at a local vLLM endpoint with the config in the model card and you have a Claude-Code-style setup running on your laptop.
Zero-ops hosted — use the Cohere API, Cohere Model Vault (fully managed inference), or OpenRouter, which already lists it with a free tier as of June 17, 2026.

Minimum hardware, per Cohere’s docs: 1× Nvidia H100 at FP8, or 1× H100 at FP4 (docs.cohere.com). At 19GB the q4 quant even runs on a beefy Mac Studio with the MLX build, which is wild for a 30B-class MoE.

One pricing note if you’d rather skip self-hosting: as of June 17, 2026, North Mini Code shows up on OpenRouter with a free tier under the slug cohere/north-mini-code:free, and Cohere’s API pricing page lists it alongside the rest of the Command family (openrouter.ai). For managed, single-tenant deployment, Cohere’s Model Vault is the official option — it’s the same fully-managed inference environment Cohere uses for its enterprise customers, just pointed at the open model.

What’s actually inside the model?

North Mini Code is a decoder-only sparse MoE transformer with 128 experts (8 active per token), SwiGLU activations, and a 3:1 mix of sliding-window attention with RoPE and global attention with no positional embeddings. It was post-trained with two stages of supervised fine-tuning followed by RLVR (huggingface.co).

A few architectural details that matter in practice:

Context length: 256K tokens total, with 64K max output. That’s long enough to ingest a mid-sized codebase and have room for agentic reasoning.
Native tool use and interleaved thinking. The model was trained against multiple harnesses — SWE-Agent, mini-SWE-Agent, OpenCode, and Terminus 2 — so it generalizes rather than overfitting to one scaffold (ollama.com).
JSON-schema tool descriptions work best. If you’re wiring it into a custom agent, describe your tools as JSON schemas rather than natural-language prompts.
Pass reasoning forward. Like most modern agentic models, performance drops if you strip the thinking tokens between turns. Keep them in the chat history.

Who is this actually for?

North Mini Code is built for developers and enterprises who want a coding agent they can self-host, audit, fine-tune, and run inside their own infrastructure — without paying frontier-API prices or sending code to a third-party cloud. That’s the explicit framing from Cohere, Frosst, and AI Business’ coverage (cohere.com; thenewstack.io; aibusiness.com).

Three concrete use cases I think make sense:

Regulated industries that already use Cohere for retrieval or chat can now keep the coding layer on-prem too, with the same deployment story.
Indie devs and hobbyists who want a Claude-Code-quality experience without a $200/month Cursor subscription — assuming you have the hardware.
Companies nervous about vendor lock-in. The AI Business piece quotes Futurum analyst Bradley Shimmin pointing out that recent events — like governments forcing model providers to disable models — show how fragile frontier-only stacks can be (aibusiness.com).

The honest counter-case: if you need long-horizon, multi-day reasoning across huge codebases, a frontier model like Anthropic’s Claude Fable 5 (which hit #1 on the AA Intelligence Index on June 10, 2026) is still going to outperform it. North Mini Code isn’t trying to win that race — it’s trying to win the “good enough, fast, mine to control” race.

If you’re comparing it to direct open-source competitors like Mistral’s Devstral 2 (released December 2025) or JetBrains’ Mellum2, the story is similar: those are also Apache 2.0 and also small, but Cohere is the only one publishing the RLVR post-training details and the only one with a 256K context window in this size class (thenewstack.io). The “sovereign” branding is more than marketing — it’s a concrete deployment story for anyone whose security team has opinions about where their code goes.

What’s next for Cohere’s open-source roadmap?

Cohere says North Mini Code is the first of a new generation of models, with more open-weights releases planned and community feedback directly shaping the roadmap. This isn’t a one-off — it’s a deliberate pivot toward a sovereign, open-source developer ecosystem (cohere.com).

Cohere’s open-source track record in 2026 backs that up:

January 2026: Cohere North platform goes GA.
February 2026: Cohere Tiny Aya released, multilingual, open-weights.
May 20, 2026: Command A+ ships under Apache 2.0 — 218B MoE, 25B active.
June 9, 2026: North Mini Code launches.

If the pattern holds, expect a larger North coding model later this year, probably targeting the Claude Opus / GPT-5 tier for coding specifically — but open-weights, and probably Apache 2.0.

My honest take

I’ve been tracking Cohere since the Command days, and North Mini Code is the first release that actually feels aimed at me — a developer who wants a real coding agent I can run on my own metal.

The numbers are honest. It’s not state-of-the-art on every benchmark, but it’s clearly competitive in its size class and it does the one thing the open-source coding world has been missing: a permissive license, real RLVR post-training for tool use, and a path that doesn’t require eight H100s.

If you’re already running Llama 3.3 70B or Qwen3 locally for coding, try swapping in North Mini Code via Ollama. The terminal-task throughput numbers alone — 2.8x faster than Devstral Small 2 — are worth 20 minutes of your evening. Grab it from Hugging Face, ping Cohere on X or r/LocalLLaMA with what you build, and watch this space.

Cohere North Mini Code: New Open-Source AI Coding Model for Developers

Key Takeaways

Summarize with AI

Cohere North Mini Code: New Open-Source AI Coding Model for Developers

What is Cohere North Mini Code?

Why did Cohere release this now?

North Mini Code benchmarks: how good is it, really?

Benchmark comparison: North Mini Code vs other open coding models

How is North Mini Code licensed?

How do you actually deploy it?

What’s actually inside the model?

Who is this actually for?

What’s next for Cohere’s open-source roadmap?

My honest take

Get our weekly AI digest

AIUnpacker Editorial Team

More in AI Models & Releases

GLM-5.2 Released: New Long-Context AI Model for Agents and Coding

Kimi K2.7 Code Released: Is This the Best Open AI Coding Model?

Google DiffusionGemma: 4x Faster AI Text Generation Explained

Claude Fable 5 and Mythos 5 Released: Anthropic's Biggest AI Update Yet