MiniMax has formally open-sourced MiniMax M2.7, making the mannequin weights publicly accessible on Hugging Face. Initially introduced on March 18, 2026, MiniMax M2.7 is the MiniMax’s most succesful open-source mannequin so far — and its first mannequin to actively take part in its personal growth cycle, a significant shift in how giant language fashions are constructed and iterated.
What’s MiniMax M2.7?
MiniMax M2.7 is a part of MiniMax’s M2-series of Combination-of-Consultants (MoE) fashions. MoE is an architectural design the place solely a subset of the overall parameters are ‘activated’ throughout any inference go, which makes the mannequin considerably sooner and cheaper to serve in comparison with a dense mannequin of comparable output high quality.
MiniMax M2.7 is constructed round three core functionality areas: skilled software program engineering, skilled workplace work, and what MiniMax calls Agent Groups — native multi-agent collaboration. MiniMax M2.7 is able to constructing complicated agent harnesses and finishing extremely elaborate productiveness duties, leveraging capabilities comparable to Agent Groups, complicated Expertise, and dynamic device search.
SOTA Benchmark Efficiency: SWE-Professional and Terminal Bench 2
On SWE-Professional, which covers a number of programming languages, MiniMax M2.7 achieved a 56.22% accuracy charge, matching GPT-5.3-Codex. SWE-Professional duties span log evaluation, bug troubleshooting, code safety overview, and machine studying workflow debugging — a lot nearer to the messy actuality of manufacturing methods than normal algorithmic coding checks.
On Terminal Bench 2 (57.0%) and NL2Repo (39.8%), each of which demand a excessive diploma of system-level comprehension, MiniMax M2.7 performs solidly. The mannequin excels not solely at code era however can even deeply perceive the operational logic and collaborative dynamics of software program methods.
On the repo-level code era benchmark VIBE-Professional, MiniMax M2.7 scored 55.6%, practically on par with Opus 4.6 — which means whether or not the requirement entails Net, Android, iOS, or simulation duties, they are often handed on to MiniMax M2.7 to finish. It additionally demonstrates a robust benefit on benchmarks nearer to real-world engineering eventualities: SWE Multilingual (76.5) and Multi SWE Bench (52.7).
Manufacturing Debugging: Below Three Minutes
When confronted with alerts in manufacturing, MiniMax M2.7 can correlate monitoring metrics with deployment timelines to carry out causal reasoning, conduct statistical evaluation on hint sampling and suggest exact hypotheses, proactively hook up with databases to confirm root causes, pinpoint lacking index migration information within the code repository, and use non-blocking index creation to cease the bleeding earlier than submitting a merge request. MiniMax staff reviews that on a number of events, this diminished restoration time for stay manufacturing system incidents to below three minutes. From observability evaluation and database experience to SRE-level decision-making, this positions MiniMax M2.7 as one thing past a code-generation mannequin.
The Self-Evolution Structure
To check the boundaries of autonomous enchancment, MiniMax M2.7 was tasked with optimizing a mannequin’s programming efficiency on an inner scaffold. It ran solely autonomously, executing an iterative loop of ‘analyze failure trajectories → plan adjustments → modify scaffold code → run evaluations → evaluate outcomes → resolve to maintain or revert adjustments’ for over 100 rounds. Throughout this course of, MiniMax M2.7 found efficient optimizations by itself: systematically trying to find the optimum mixture of sampling parameters comparable to temperature, frequency penalty, and presence penalty; designing extra particular workflow tips (comparable to routinely looking for a similar bug sample in different information after a repair); and including loop detection to the scaffold’s agent loop. This achieved a 30% efficiency enchancment on inner analysis units.
Inside MiniMax’s personal reinforcement studying staff workflows, M2.7 is now able to dealing with 30%–50% of the workflow end-to-end, with human researchers solely interacting for crucial selections and discussions.
MLE Bench Lite: Testing Autonomous ML Experimentation
MiniMax staff additionally examined MiniMax M2.7 on MLE Bench Lite, OpenAI’s open-sourced suite of twenty-two machine studying competitions runnable on a single A30 GPU, protecting nearly all levels of the ML workflow.
For this analysis, MiniMax staff designed a easy three-component harness: short-term reminiscence, self-feedback, and self-optimization. After every iteration spherical, the agent generates a short-term reminiscence markdown file, performs self-criticism on the present outcomes, and offers optimization instructions for the subsequent spherical. Three trials had been run, every with a 24-hour window for iterative evolution.
One of the best run achieved 9 gold medals, 5 silver medals, and 1 bronze medal. The typical medal charge throughout the three runs was 66.6%, a consequence second solely to Opus-4.6 (75.7%) and GPT-5.4 (71.2%), tying with Gemini-3.1 (66.6%).
Skilled Workplace Work and Finance
Past software program engineering, MiniMax M2.7 targets skilled workplace duties. Within the GDPval-AA analysis, which measures area experience and activity supply functionality throughout 45 fashions, MiniMax M2.7 achieved an ELO rating of 1495 — the best amongst open-source fashions, second solely to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3.
On Toolathon, MiniMax M2.7 achieved an accuracy of 46.3%, reaching the worldwide prime tier. In MM Claw testing — an analysis MiniMax constructed primarily based on real-world utilization patterns from the OpenClaw private agent platform — MiniMax M2.7 maintained a 97% talent compliance charge throughout 40 complicated abilities (every exceeding 2,000 tokens) and achieved an general accuracy of 62.7%, approaching Sonnet 4.6.
In finance, MiniMax M2.7 can autonomously learn an organization’s annual reviews and earnings name transcripts, cross-reference a number of analysis reviews, independently design assumptions and construct a income forecast mannequin, and produce a PPT and Phrase analysis report primarily based on templates — understanding, making judgments, and producing output like a junior analyst.
Key Takeaways
- MiniMax M2.7 is now formally open supply, with weights accessible on Hugging Face, making a frontier-grade agentic mannequin freely accessible for builders to deploy and construct on.
- MiniMax M2.7 achieves SOTA efficiency on real-world software program engineering benchmarks, scoring 56.22% on SWE-Professional (matching GPT-5.3-Codex) and 57.0% on Terminal Bench 2 — checks that measure production-level reasoning, not simply code era.
- MiniMax M2.7 is the primary mannequin to actively take part in its personal growth, working over 100 autonomous rounds of scaffold optimization and reaching a 30% efficiency enchancment — an early, concrete instance of AI-assisted AI growth in apply.
- The mannequin is constructed for actual agentic deployments, sustaining 97% talent adherence throughout 40 complicated abilities (every exceeding 2,000 tokens), supporting native Agent Groups with secure function boundaries, and dealing with 30–50% of MiniMax’s inner RL staff workflows autonomously.
- MiniMax M2.7 is the highest-ranked open-source mannequin on GDPval-AA with an ELO rating of 1495 throughout 45 fashions, demonstrating robust skilled work capabilities spanning workplace doc enhancing, monetary evaluation, and multi-round high-fidelity activity supply.
Take a look at the Technical details and Model Weight. Additionally, be at liberty to observe us on Twitter and don’t overlook to affix our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Connect with us
