Alibaba Qwen Crew Releases Qwen3.6-27B: A Dense Open-Weight Mannequin Outperforming 397B MoE on Agentic Coding Benchmarks

Alibaba’s Qwen Crew has launched Qwen3.6-27B, the primary dense open-weight mannequin within the Qwen3.6 household — and arguably probably the most succesful 27-billion-parameter mannequin out there at the moment for coding brokers. It brings substantial enhancements in agentic coding, a novel Considering Preservation mechanism, and a hybrid structure that blends Gated DeltaNet linear consideration with conventional self-attention — all beneath an Apache 2.0 license.

The discharge comes weeks after the Qwen3.6-35B-A3B, a sparse Combination-of-Consultants (MoE) mannequin with solely 3B lively parameters which itself adopted the broader Qwen3.5 collection. Qwen3.6-27B is the household’s second mannequin and the primary absolutely dense variant — and on a number of key benchmarks, it really outperforms each Qwen3.6-35B-A3B and the a lot bigger Qwen3.5-397B-A17B MoE mannequin. The Qwen staff describes the discharge as prioritizing “stability and real-world utility,” formed by direct neighborhood suggestions slightly than benchmark optimization.

The Qwen staff releases two weight variants on Hugging Face Hub: Qwen/Qwen3.6-27B in BF16 and Qwen/Qwen3.6-27B-FP8, a quantized model utilizing fine-grained FP8 quantization with a block measurement of 128, with efficiency metrics almost similar to the unique mannequin. Each variants are suitable with SGLang (>=0.5.10), vLLM (>=0.19.0), KTransformers, and Hugging Face Transformers.

https://qwen.ai/weblog?id=qwen3.6-27b

What’s New: Two Key Options

Agentic Coding is the primary main improve. The mannequin has been particularly optimized to deal with frontend workflows and repository-level reasoning — duties that require understanding a big codebase, navigating file buildings, modifying throughout a number of recordsdata, and producing constant, runnable output. On QwenWebBench, an inner bilingual (EN/CN) front-end code era benchmark spanning seven classes — Net Design, Net Apps, Video games, SVG, Information Visualization, Animation, and 3D — Qwen3.6-27B scores 1487, a major leap from 1068 for Qwen3.5-27B and 1397 for Qwen3.6-35B-A3B. On NL2Repo, which assessments repository-level code era, the mannequin scores 36.2 versus 27.3 for Qwen3.5-27B. On SWE-bench Verified — the neighborhood commonplace for autonomous software program engineering brokers — it reaches 77.2, up from 75.0, and aggressive with Claude 4.5 Opus’s 80.9.

Considering Preservation is the second, and arguably extra architecturally fascinating, addition. By default, most LLMs solely retain the chain-of-thought (CoT) reasoning generated for the present person message; reasoning from earlier turns is discarded. Qwen3.6 introduces a brand new choice — enabled by way of "chat_template_kwargs": {"preserve_thinking": True} within the API — to retain and leverage considering traces from historic messages throughout all the dialog. For iterative agent workflows, that is virtually vital: the mannequin carries ahead earlier reasoning context slightly than re-deriving it every flip. This could cut back general token consumption by minimizing redundant reasoning and likewise enhance KV cache utilization.

Beneath the Hood: A Hybrid Structure

Qwen3.6-27B is a Causal Language Mannequin with a Imaginative and prescient Encoder. It’s natively multimodal, supporting textual content, picture, and video inputs — skilled by means of each pre-training and post-training levels.

The mannequin has 27B parameters distributed throughout 64 layers, with a hidden dimension of 5120 and a token embedding house of 248,320 (padded). The hidden format follows a particular repeating sample: 16 blocks, every structured as 3 × (Gated DeltaNet → FFN) → 1 × (Gated Consideration → FFN). This implies three out of each 4 sublayers use Gated DeltaNet — a type of linear consideration — with solely each fourth sublayer utilizing commonplace Gated Consideration.

What’s Gated DeltaNet? Conventional self-attention computes relationships between each token pair, which scales quadratically (O(n²)) with sequence size — costly for lengthy contexts. Linear consideration mechanisms like DeltaNet approximate this with linear complexity (O(n)), making them considerably quicker and extra memory-efficient. Gated DeltaNet provides a gating mechanism on high, primarily studying when to replace or retain info, comparable in spirit to LSTM gating however utilized to the eye computation. In Qwen3.6-27B, Gated DeltaNet sublayers use 48 linear consideration heads for values (V) and 16 for queries and keys (QK), with a head dimension of 128.

The Gated Consideration sublayers use 24 consideration heads for queries (Q) and solely 4 for keys and values (KV) — a configuration that considerably reduces KV cache reminiscence at inference time. These layers have a head dimension of 256 and use Rotary Place Embedding (RoPE) with a rotation dimension of 64. The FFN intermediate dimension is 17,408.

The mannequin additionally makes use of Multi-Token Prediction (MTP), skilled with multi-steps. At inference time, this permits speculative decoding — the place the mannequin generates a number of candidate tokens concurrently and verifies them in parallel — bettering throughput with out compromising high quality.

Context Window: 262K Native, 1M with YaRN

Natively, Qwen3.6-27B helps a context size of 262,144 tokens — sufficient to carry a big codebase or a book-length doc. For duties exceeding this, the mannequin helps YaRN (One more RoPE extension) scaling, extensible as much as 1,010,000 tokens. The Qwen staff advises maintaining context not less than 128K tokens to protect the mannequin’s considering capabilities.

Benchmark Efficiency

On agentic coding benchmarks, the features over Qwen3.5-27B are substantial. SWE-bench Professional scores 53.5 versus 51.2 for Qwen3.5-27B and 50.9 for the a lot bigger Qwen3.5-397B-A17B — which means the 27B dense mannequin exceeds a 397B MoE on this job. SWE-bench Multilingual scores 71.3 versus 69.3 for Qwen3.5-27B. Terminal-Bench 2.0, evaluated beneath a 3-hour timeout with 32 CPUs and 48 GB RAM, reaches 59.3 — matching Claude 4.5 Opus precisely, and outperforming Qwen3.6-35B-A3B (51.5). SkillsBench Avg5 reveals probably the most putting achieve: 48.2 versus 27.2 for Qwen3.5-27B, a 77% relative enchancment, additionally effectively above Qwen3.6-35B-A3B’s 28.7.

On reasoning benchmarks, GPQA Diamond reaches 87.8 (up from 85.5), AIME26 hits 94.1 (up from 92.6), and LiveCodeBench v6 scores 83.9 (up from 80.7).

Imaginative and prescient-language benchmarks present constant parity or enchancment over Qwen3.5-27B. VideoMME (with subtitles) reaches 87.7, AndroidWorld (visible agent benchmark) scores 70.3, and VlmsAreBlind — which probes for widespread visible understanding failure modes — scores 97.0.

https://qwen.ai/weblog?id=qwen3.6-27b

Key Takeaways

Qwen3.6-27B is Alibaba’s first dense open-weight mannequin within the Qwen3.6 household, constructed to prioritize real-world coding utility over benchmark efficiency — licensed beneath Apache 2.0.
The mannequin introduces Considering Preservation, a brand new characteristic that retains reasoning traces throughout dialog historical past, lowering redundant token era and bettering KV cache effectivity in multi-turn agent workflows.
Agentic coding efficiency is the important thing power — Qwen3.6-27B scores 77.2 on SWE-bench Verified, 59.3 on Terminal-Bench 2.0 (matching Claude 4.5 Opus), and 1487 on QwenWebBench, outperforming each its predecessor Qwen3.5-27B and the bigger Qwen3.5-397B-A17B MoE mannequin on a number of duties.
The structure makes use of a hybrid Gated DeltaNet + Gated Consideration format throughout 64 layers — three out of each 4 sublayers use environment friendly linear consideration (Gated DeltaNet), with Multi-Token Prediction (MTP) enabling speculative decoding at serving time.
Two weight variants can be found on Hugging Face Hub — Qwen3.6-27B (BF16) and Qwen3.6-27B-FP8 (fine-grained FP8 with block measurement 128) — each supporting SGLang, vLLM, KTransformers, and Hugging Face Transformers, with a local 262,144-token context window extensible to 1,010,000 tokens by way of YaRN.

Try the Technical details, Qwen/Qwen3.6-27B and Qwen/Qwen3.6-27B-FP8. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Connect with us

Source link

Alibaba Qwen Crew Releases Qwen3.6-27B: A Dense Open-Weight Mannequin Outperforming 397B MoE on Agentic Coding Benchmarks

OpenAI says hackers stole some information after newest code safety concern

Cerebras raises $5.5B, kicking off 2026’s IPO season with a bang

Khosla Ventures is betting $10M on Ian Crosby, whose final startup, Bench, imploded

Alibaba Qwen Crew Releases Qwen3.6-27B: A Dense Open-Weight Mannequin Outperforming 397B MoE on Agentic Coding Benchmarks

What’s New: Two Key Options

Beneath the Hood: A Hybrid Structure

Context Window: 262K Native, 1M with YaRN

Benchmark Efficiency

Key Takeaways

Related Posts

OpenAI says hackers stole some information after newest code safety concern

Cerebras raises $5.5B, kicking off 2026’s IPO season with a bang

Khosla Ventures is betting $10M on Ian Crosby, whose final startup, Bench, imploded