Moonshot AI Releases Kimi K2.6 with Lengthy-Horizon Coding, Agent Swarm Scaling to 300 Sub-Brokers and 4,000 Coordinated Steps

Moonshot AI, the Chinese language AI lab behind the Kimi assistant, as we speak open-sourced Kimi K2.6 — a local multimodal agentic mannequin that pushes the boundaries of what an AI system can do when left to run autonomously on laborious software program engineering issues. The discharge targets sensible deployment situations: long-running coding brokers, front-end era from pure language, massively parallel agent swarms coordinating tons of of specialised sub-agents concurrently, and a brand new open ecosystem the place people and brokers from any machine collaborate on the identical job. The mannequin is offered now on Kimi.com, the Kimi App, the API, and Kimi Code CLI. Weights are revealed on Hugging Face below a Modified MIT License.

What Sort of Mannequin is This, Technically?

Kimi K2.6 is a Combination-of-Specialists (MoE) mannequin — an structure that’s turn out to be more and more dominant at frontier scale. As an alternative of activating all of a mannequin’s parameters for each token it processes, a MoE mannequin routes every token to a small subset of specialised ‘specialists.’ This lets you construct a really giant mannequin whereas maintaining inference compute tractable.

Kimi K2.6 has 1 trillion whole parameters, however solely 32 billion are activated per token. It has 384 specialists in whole, with 8 chosen per token, plus 1 shared knowledgeable that’s all the time lively. The mannequin has 61 layers (together with one dense layer), makes use of an consideration hidden dimension of seven,168, a MoE hidden dimension of two,048 per knowledgeable, and 64 consideration heads.

Past textual content, K2.6 is a native multimodal mannequin — that means imaginative and prescient is baked in architecturally, not bolted on. It makes use of a MoonViT imaginative and prescient encoder with 400M parameters and helps picture and video enter natively. Different architectural particulars: it makes use of Multi-head Latent Consideration (MLA) as its consideration mechanism, SwiGLU because the activation perform, a vocabulary dimension of 160K tokens, and a context size of 256K tokens.

For deployment, K2.6 is beneficial to run on vLLM, SGLang, or KTransformers. It shares the identical structure as Kimi K2.5, so present deployment configurations could be reused straight. The required transformers model is >=4.57.1, <5.0.0.

The Lengthy-Horizon Coding Headline Numbers

The metric that can probably get essentially the most consideration from dev groups is SWE-Bench Professional — a benchmark testing whether or not a mannequin can resolve real-world GitHub points in skilled software program repositories.

Kimi K2.6 scores 58.6 on SWE-Bench Professional, in comparison with 57.7 for GPT-5.4 (xhigh), 53.4 for Claude Opus 4.6 (max effort), 54.2 for Gemini 3.1 Professional (pondering excessive), and 50.7 for Kimi K2.5. On SWE-Bench Verified it scores 80.2, sitting inside a good band of top-tier fashions.

On Terminal-Bench 2.0 utilizing the Terminus-2 agent framework, K2.6 achieves 66.7, in comparison with 65.4 for each GPT-5.4 and Claude Opus 4.6, and 68.5 for Gemini 3.1 Professional. On LiveCodeBench (v6), it scores 89.6 vs. Claude Opus 4.6’s 88.8.

Maybe essentially the most placing quantity for agentic workloads is Humanity’s Final Examination (HLE-Full) with instruments: K2.6 scores 54.0 — main each mannequin within the comparability, together with GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Professional (51.4). HLE is broadly thought of one of many hardest information benchmarks, and the with-tools variant particularly checks how effectively a mannequin can leverage exterior assets autonomously. Internally, Moonshot evaluates long-horizon coding features utilizing their Kimi Code Bench, an inner benchmark protecting various, difficult end-to-end duties throughout languages and domains, the place K2.6 demonstrates vital enhancements over K2.5.

https://www.kimi.com/weblog/kimi-k2-6

What 13 Hours of Autonomous Coding Truly Appears Like

Two engineering case research within the launch doc what ‘long-horizon coding’ means in observe.

Within the first, Kimi K2.6 efficiently downloaded and deployed the Qwen3.5-0.8B mannequin regionally on a Mac, then carried out and optimized mannequin inference in Zig — a extremely area of interest programming language — demonstrating distinctive out-of-distribution generalization. Throughout 4,000+ instrument calls, over 12 hours of steady execution, and 14 iterations, K2.6 improved throughput from roughly 15 to roughly 193 tokens/sec, in the end attaining speeds roughly 20% sooner than LM Studio.

Within the second, Kimi K2.6 autonomously overhauled exchange-core, an 8-year-old open-source monetary matching engine. Over a 13-hour execution, the mannequin iterated via 12 optimization methods, initiating over 1,000 instrument calls to exactly modify greater than 4,000 strains of code. Performing as an knowledgeable methods architect, K2.6 analyzed CPU and allocation flame graphs to pinpoint hidden bottlenecks and reconfigured the core thread topology from 4ME+2RE to 2ME+1RE — extracting a 185% medium throughput leap (from 0.43 to 1.24 MT/s) and a 133% efficiency throughput acquire (from 1.23 to 2.86 MT/s).

Agent Swarms: Scaling Horizontally, Not Simply Vertically

One in every of K2.6’s most architecturally fascinating capabilities is its Agent Swarm — an strategy to parallelizing advanced duties throughout many specialised sub-agents, slightly than counting on a single, deeper reasoning chain.

The structure scales horizontally to 300 sub-agents executing throughout 4,000 coordinated steps concurrently, a considerable enlargement from K2.5’s 100 sub-agents and 1,500 steps. The swarm dynamically decomposes duties into heterogeneous subtasks — combining broad internet search with deep analysis, large-scale doc evaluation with long-form writing, and multi-format content material era in parallel — then delivers consolidated outputs together with paperwork, web sites, slides, and spreadsheets inside a single autonomous run. The swarm additionally introduces a concrete Abilities functionality: it could possibly convert any high-quality PDF, spreadsheet, slide, or Phrase doc right into a reusable Ability. K2.6 captures and maintains the doc’s structural and stylistic DNA, permitting it to breed the identical high quality and format in future duties — consider it as educating the swarm by instance slightly than immediate.

Concrete demonstrations embrace: a 100-sub-agent run that matched a single uploaded CV in opposition to 100 related roles in California and delivered 100 totally personalized resumes; one other that recognized 30 retail shops in Los Angeles with out web sites from Google Maps and generated touchdown pages for every; and one which turned an astrophysics paper right into a reusable educational talent after which produced a 40-page, 7,000-word analysis paper alongside a structured dataset with 20,000+ entries and 14 astronomy-grade charts.

On the BrowseComp benchmark in Agent Swarm mode, K2.6 scores 86.3 in comparison with 78.4 for Kimi K2.5. On DeepSearchQA (f1-score), K2.6 scores 92.5 in opposition to 78.6 for GPT-5.4.

Deliver Your Personal Brokers: Claw Teams

Past Moonshot’s personal swarm infrastructure, K2.6 introduces Claw Teams as a analysis preview — a brand new characteristic that opens the agent swarm structure to an exterior, heterogeneous ecosystem.

The important thing design precept: a number of brokers and people function as real collaborators in a shared operational area. Customers can onboard brokers from any machine, operating any mannequin, every carrying their very own specialised toolkits, abilities, and chronic reminiscence contexts — whether or not deployed on native laptops, cell gadgets, or cloud cases. On the middle of this swarm, K2.6 serves as an adaptive coordinator: it dynamically matches duties to brokers based mostly on their particular talent profiles and obtainable instruments, detects when an agent encounters failure or stalls, robotically reassigns the duty or regenerates subtasks, and manages the complete lifecycle of deliverables from initiation via validation to completion.

Moonshot has been utilizing Claw Teams internally to run their very own content material manufacturing and launch campaigns, with specialised brokers together with Demo Makers, Benchmark Makers, Social Media Brokers, and Video Makers working in parallel — with K2.6 coordinating the method. For devs enthusiastic about multi-agent orchestration architectures, that is price wanting into: it represents a shift from ‘AI does duties for you’ to ‘AI coordinates a workforce of heterogeneous brokers, a few of which you constructed, in your behalf.’

Proactive Brokers: 5 Days of Autonomous Operation

K2.6 demonstrates sturdy efficiency in persistent, proactive brokers similar to OpenClaw and Hermes, which function throughout a number of functions with steady, 24/7 execution. These workflows require AI to proactively handle schedules, execute code, and orchestrate cross-platform operations with out human oversight.

Moonshot’s personal RL infrastructure workforce used a K2.6-backed agent that operated autonomously for five days, managing monitoring, incident response, and system operations — demonstrating persistent context, multi-threaded job dealing with, and full-cycle execution from alert to decision.

Efficiency on this regime is measured by an inner Claw Bench, an analysis suite spanning 5 domains: Coding Duties, IM Ecosystem Integration, Data Analysis & Evaluation, Scheduled Activity Administration, and Reminiscence Utilization. Throughout all 5, K2.6 considerably outperforms K2.5 in job completion charges and power invocation accuracy — notably in workflows requiring sustained autonomous operation with out human oversight.

Two Operational Modes: Pondering and On the spot

For devs integrating through API, K2.6 exposes two inference modes that matter for latency/high quality tradeoffs:

Pondering mode prompts full chain-of-thought reasoning — the mannequin causes via an issue earlier than producing a ultimate reply. That is beneficial for advanced coding and agentic duties, with a beneficial temperature of 1.0. There may be additionally a protect pondering mode, which retains full reasoning content material throughout multi-turn interactions and enhances efficiency in coding agent situations — disabled by default, however price enabling when constructing brokers that want to take care of coherent reasoning state throughout many steps.

On the spot mode disables prolonged reasoning for lower-latency responses. To make use of On the spot mode through the official API, go {'pondering': {'kind': 'disabled'}} in extra_body. For vLLM or SGLang deployments, go {'chat_template_kwargs': {"pondering": False}} as a substitute, with a beneficial temperature of 0.6 and top-p of 0.95.

Key Takeaways

Kimi K2.6 is a 1-trillion-parameter, native multimodal MoE mannequin with solely 32B parameters activated per token, launched totally open-source below a Modified MIT License.
K2.6 leads all frontier fashions on HLE-Full with instruments (54.0), outperforming GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Professional (51.4) on one in every of AI’s hardest agentic benchmarks.
In real-world checks, K2.6 autonomously overhauled an 8-year-old monetary matching engine over 13 hours, delivering a 185% medium throughput leap and a 133% efficiency throughput acquire.
The Agent Swarm structure scales to 300 sub-agents executing 4,000 coordinated steps concurrently, and may convert any PDF, spreadsheet, or slide right into a reusable Ability that preserves structural and stylistic DNA.
Claw Teams, launched as a analysis preview, lets people and brokers from any machine operating any mannequin collaborate in a shared swarm, with K2.6 serving as an adaptive coordinator that dynamically assigns duties, detects failures, and manages full supply lifecycles.

Try the Model Weights, API Access and Technical details. Additionally, be at liberty to observe us on Twitter and don’t neglect to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Must associate with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Connect with us

Source link

Moonshot AI Releases Kimi K2.6 with Lengthy-Horizon Coding, Agent Swarm Scaling to 300 Sub-Brokers and 4,000 Coordinated Steps

OpenAI says hackers stole some information after newest code safety concern

Cerebras raises $5.5B, kicking off 2026’s IPO season with a bang

Khosla Ventures is betting $10M on Ian Crosby, whose final startup, Bench, imploded

Moonshot AI Releases Kimi K2.6 with Lengthy-Horizon Coding, Agent Swarm Scaling to 300 Sub-Brokers and 4,000 Coordinated Steps

What Sort of Mannequin is This, Technically?

The Lengthy-Horizon Coding Headline Numbers

What 13 Hours of Autonomous Coding Truly Appears Like

Agent Swarms: Scaling Horizontally, Not Simply Vertically

Deliver Your Personal Brokers: Claw Teams

Proactive Brokers: 5 Days of Autonomous Operation

Two Operational Modes: Pondering and On the spot

Key Takeaways

Related Posts

OpenAI says hackers stole some information after newest code safety concern

Cerebras raises $5.5B, kicking off 2026’s IPO season with a bang

Khosla Ventures is betting $10M on Ian Crosby, whose final startup, Bench, imploded