Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers

    Naveed AhmadBy Naveed Ahmad16/11/2025No Comments6 Mins Read
    blog banner 48


    Cerebras has launched MiniMax-M2-REAP-162B-A10B, a compressed Sparse Combination-of-Consultants (SMoE) Causal Language Mannequin derived from MiniMax-M2, utilizing the brand new Router weighted Professional Activation Pruning (REAP) technique. The mannequin retains the habits of the unique 230B complete, 10B energetic MiniMax M2, whereas pruning consultants and lowering reminiscence for deployment targeted workloads resembling coding brokers and gear calling.

    Structure and core specs

    MiniMax-M2-REAP-162B-A10B has these key properties:

    • Base mannequin: MiniMax-M2
    • Compression technique: REAP, Router weighted Professional Activation Pruning
    • Whole parameters: 162B
    • Lively parameters per token: 10B
    • Layers: 62 transformer blocks
    • Consideration heads per layer: 48
    • Consultants: 180 consultants, obtained by pruning a 256 professional configuration
    • Activated consultants per token: 8
    • Context size: 196,608 tokens
    • License: modified MIT, derived from MiniMaxAI MiniMax M2

    The SMoE design implies that the mannequin shops 162B parameters, however every token solely routes by way of a small set of consultants, so the efficient compute value per token is just like a 10B dense mannequin. MiniMax M2 itself is positioned as an MoE mannequin constructed for coding and agentic workflows, with 230B complete parameters and 10B energetic, which this checkpoint inherits.

    How REAP compresses MiniMax-M2?

    MiniMax-M2-REAP-162B-A10B is created by making use of REAP uniformly throughout all MoE blocks of MiniMax M2, at a 30 % professional pruning charge.

    The REAP technique defines a saliency rating for every professional that mixes:

    • Router gate values: How usually and the way strongly the router selects that professional
    • Professional activation norms: The magnitude of the professional output when energetic

    Consultants that contribute minimally to the layer output, beneath this mixed criterion, are eliminated. The remaining consultants maintain their authentic weights and the router retains separate gates for every of them. That is one shot compression, there isn’t a further advantageous tuning after pruning within the technique definition.

    A core theoretical outcome within the REAP’s research paper is that professional merging with summed gates causes useful subspace collapse. When consultants are merged, the router loses its unbiased, enter dependent management over these consultants, so a single merged professional should approximate an enter dependent combination that was initially expressed by way of a number of consultants. The analysis staff proves that, each time the router coverage depends upon the enter and the consultants aren’t similar, this introduces irreducible error. In distinction, pruning removes some consultants however preserves unbiased management of the survivors, so the error scales with the gate weight of the eliminated consultants.

    Throughout a set of SMoE fashions within the 20B to 1T parameter vary, REAP constantly outperforms professional merging and different pruning standards on generative benchmarks resembling code era, mathematical reasoning and gear calling, particularly at 50 % compression.

    Accuracy beneath 30 % professional pruning

    The MiniMax-M2-REAP-162B-A10B mannequin will get in contrast on three checkpoints on customary coding, reasoning and agentic benchmarks:

    • MiniMax-M2 (230B, base mannequin)
    • MiniMax-M2-REAP-172B-A10B, 25 % pruning
    • MiniMax-M2-REAP-162B-A10B, 30 % pruning
    https://huggingface.co/cerebras/MiniMax-M2-REAP-162B-A10B

    On coding benchmarks resembling HumanEval, HumanEval Plus, MBPP and MBPP Plus, the 162B REAP mannequin stays very near the bottom mannequin. HumanEval sits round 90% vary, and MBPP stays within the 80% vary, with the 172B and 162B fashions basically monitoring the unique MiniMax-M2 inside a number of factors.

    On reasoning benchmarks resembling AIME 25 and MATH 500, there are small shifts between the three fashions, however there isn’t a collapse at 30 % pruning and the 162B checkpoint stays aggressive with the bottom mannequin.

    On device calling and agentic analysis, represented by τ2 bench in a telecom setting, the 162B REAP mannequin once more matches the bottom mannequin inside small variance. The mannequin card explicitly states that this checkpoint retains virtually similar efficiency whereas being about 30 % lighter in parameter rely.

    These outcomes line up with the broader REAP study, which reviews close to lossless compression for code era and gear calling on a number of massive SMoE architectures when pruning consultants utilizing the REAP criterion.

    Deployment, reminiscence utilization and noticed throughput

    Cerebras offers a direct vLLM serve instance and positions MiniMax-M2-REAP-162B-A10B as a drop in mannequin for the prevailing MiniMax M2 integration.

    vllm serve cerebras/MiniMax-M2-REAP-162B-A10B 
        --tensor-parallel-size 8 
        --tool-call-parser minimax_m2 
        --reasoning-parser minimax_m2_append_think 
        --trust-remote-code 
        --enable_expert_parallel 
        --enable-auto-tool-choice

    If the run hits reminiscence limits, the cardboard recommends reducing --max-num-seqs, for instance to 64, to maintain batch measurement in examine on a given GPU.

    Key Takeaways

    1. SMoE structure with environment friendly compute: MiniMax-M2-REAP-162B-A10B is a Sparse Combination of Consultants mannequin with 162B complete parameters and 10B energetic parameters per token, so the compute value per token is near a 10B dense mannequin whereas retaining frontier scale capability.
    2. REAP professional pruning retains habits of MiniMax-M2: The mannequin is produced by making use of REAP Router weighted Professional Activation Pruning to MiniMax-M2 at roughly 30 % professional pruning, pruning consultants based mostly on router gate values and professional activation norms whereas leaving surviving consultants and router construction intact.
    3. Close to lossless accuracy at 30 % compression: On coding benchmarks resembling HumanEval and MBPP, and on reasoning benchmarks resembling AIME25 and MATH 500, the 162B REAP variant tracks the 230B MiniMax-M2 and a 172B REAP variant inside a number of factors, displaying close to lossless compression for code, reasoning and gear use.
    4. Pruning outperforms professional merging for generative SMoE: The REAP examine exhibits that pruning consultants utilizing a saliency criterion avoids the useful subspace collapse seen with professional merging in generative duties, and performs higher throughout massive SMoE fashions within the 22B to about 1T parameter vary.

    Comparability Desk

    Picture supply: Marktechpost.com

    Cerebras’ launch of MiniMax-M2-REAP-162B-A10B is a powerful sign that Router weighted Professional Activation Pruning is prepared for actual workloads, not simply as a analysis curiosity. The checkpoint exhibits {that a} 30 % professional pruning schedule can maintain MiniMax-M2 230B-A10B habits virtually intact whereas chopping reminiscence and preserving lengthy context coding, reasoning and gear calling efficiency, which is precisely what SMoE researchers want for sensible deployment. General, Cerebras is quietly turning professional pruning into manufacturing infrastructure for frontier class SMoE fashions.


    Take a look at the Model Weights. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.


    Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

    🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



    Source link

    Naveed Ahmad

    Related Posts

    YouTubers aren’t counting on advert income anymore — this is how some are diversifying

    10/02/2026

    Spotify hits a document 751M month-to-month customers due to Wrapped, new free options

    10/02/2026

    Hacked, leaked, uncovered: Why it’s best to by no means use stalkerware apps

    10/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.