Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About ArticlesStock — AI & Technology Journalist
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    Meet AntAngelMed: A 103B-Parameter Open-Supply Medical Language Mannequin Constructed on a 1/32 Activation-Ratio MoE Structure

    Naveed AhmadBy Naveed Ahmad13/05/2026Updated:13/05/2026No Comments8 Mins Read
    blog11 13


    A group researchers from China have launched AntAngelMed, a big open-source medical language mannequin that the group describes as the biggest and most able to its variety presently obtainable.

    What Is AntAngelMed?

    AntAngelMed is a medical-domain language mannequin with 103 billion complete parameters, but it surely doesn’t activate all of these parameters throughout inference. As an alternative, it makes use of a Combination-of-Consultants (MoE) structure with a 1/32 activation ratio, that means solely 6.1 billion parameters are energetic at any given time when processing a question.

    It helps to know the way MoE architectures work. In a regular dense mannequin, each parameter participates in processing each token. In an MoE mannequin, the community is split into many ‘skilled’ sub-networks, and a routing mechanism selects solely a small subset of them to deal with every enter. This lets you have a really giant complete parameter depend — which generally correlates with sturdy data capability — whereas retaining the precise compute price of inference proportional to the smaller energetic parameter depend.

    AntAngelMed inherits this design from Ling-flash-2.0, a base mannequin developed by inclusionAI and guided by what the group calls Ling Scaling Legal guidelines. The particular optimizations layered on high embody: refined skilled granularity, a tuned shared skilled ratio, consideration steadiness mechanisms, sigmoid routing with out auxiliary loss, an MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Place Embedding utilized to a subset of consideration heads relatively than all of them). In keeping with the analysis group, these design selections collectively permit small-activation MoE fashions to ship as much as 7× effectivity in comparison with equally sized dense architectures which implies with solely 6.1B activated parameters, AntAngelMed can match roughly 40B dense mannequin efficiency. Individually, as output size grows throughout inference, the relative velocity benefit can even attain 7× or extra over dense fashions of comparable dimension.

    https://modelscope.cn/fashions/MedAIBase/AntAngelMed

    Coaching Pipeline

    AntAngelMed makes use of a three-stage coaching course of designed to layer common language understanding on high of deep medical area adaptation.

    The first stage is continuous pre-training on large-scale medical corpora, together with encyclopedias, net textual content, and educational publications. This part is constructed on high of the Ling-flash-2.0 checkpoint, giving the mannequin a powerful common reasoning basis earlier than medical specialization begins.

    The second stage is Supervised Nice-Tuning (SFT), the place the mannequin is educated on a multi-source instruction dataset. This dataset mixes common reasoning duties — math, programming, logic — to protect chain-of-thought capabilities, alongside medical situations similar to physician–affected person Q&A, diagnostic reasoning, and security and ethics circumstances.

    The third stage is Reinforcement Studying utilizing the GRPO (Group Relative Coverage Optimization) algorithm, mixed with task-specific reward fashions. GRPO, initially launched within the DeepSeekMath paper, is a variant of PPO that estimates baselines from group scores relatively than a separate critic mannequin, making it computationally lighter. Right here, reward indicators are designed to form mannequin conduct towards empathy, structured scientific responses, security boundaries, and evidence-based reasoning — all with the purpose of lowering hallucinations on medical questions.

    Inference Efficiency

    On H20 {hardware}, AntAngelMed exceeds 200 tokens per second, which the analysis group reviews is roughly 3× quicker than a 36 billion parameter dense mannequin. With YaRN (But One other RoPE extensioN) extrapolation, it helps a 128K context size — lengthy sufficient to deal with full scientific paperwork, prolonged affected person histories, or multi-turn medical dialogues.

    The analysis group has additionally launched an FP8 quantized model of the mannequin. When this quantization is mixed with EAGLE3 speculative decoding optimization, inference throughput at a concurrency of 32 improves considerably over FP8 alone: 71% on HumanEval, 45% on GSM8K, and 94% on Math-500. These benchmarks measure coding and math reasoning duties — not medical duties instantly — however function proxies for the mannequin’s common throughput stability throughout output varieties.

    Benchmark Outcomes

    On HealthBench, the open-source medical analysis benchmark from OpenAI that makes use of simulated multi-turn medical dialogues to measure real-world scientific efficiency, AntAngelMed ranks first amongst all open-source fashions and surpasses a variety of high proprietary fashions as properly, with a very vital benefit on the HealthBench-Exhausting subset.

    On MedAIBench, an analysis system maintained by China’s Nationwide Synthetic Intelligence Medical Trade Pilot Facility, AntAngelMed ranks on the high stage, with notably sturdy scores in medical data Q&A and medical ethics and security classes.

    On MedBench, a benchmark for Chinese language healthcare LLMs masking 36 independently curated datasets and roughly 700,000 samples throughout 5 dimensions — medical data query answering, medical language understanding, medical language technology, advanced medical reasoning, and security and ethics — AntAngelMed ranks first general.

    Marktechpost’s Visible Explainer

    Technical Information
    AntAngelMed

    1 / 7

    01 — Overview
    What Is AntAngelMed?
    Collectively developed by Well being Info Middle of Zhejiang Province, Ant Healthcare, and Zhejiang Anzhen’er Medical AI Know-how Co., Ltd.

    103BWhole Params

    6.1BEnergetic at Inference

    128KContext Size

    AntAngelMed is a medical-domain LLM constructed on a 1/32 activation-ratio MoE structure. With 103B complete parameters and solely 6.1B energetic at inference time, it matches the efficiency of roughly 40B dense fashions at a fraction of the compute price.

    Mannequin weights are launched beneath Apache 2.0. The code repository is licensed beneath MIT.

    02 — Structure
    MoE Structure & Base Mannequin
    Constructed on Ling-flash-2.0 by inclusionAI, guided by Ling Scaling Legal guidelines.

    AntAngelMed makes use of a 1/32 activation-ratio MoE with optimizations throughout all core elements. These selections allow small-activation MoE fashions to ship as much as 7× effectivity over equally sized dense architectures — and as output size grows, relative speedups can attain 7× or extra.

    Key architectural elements:

    Skilled Granularity
    Shared Skilled Ratio
    Sigmoid Routing
    No Auxiliary Loss
    MTP Layer
    QK-Norm
    Partial-RoPE
    YaRN Extrapolation
    Consideration Stability

    03 — Coaching
    Three-Stage Coaching Pipeline
    Designed to layer common language understanding on high of deep medical area adaptation.

    Stage 01
    Continuous Pre-Coaching
    Constructed on Ling-flash-2.0, educated on large-scale medical corpora — encyclopedias, net textual content, and educational publications — to inject deep area and world data.

    Stage 02
    Supervised Nice-Tuning (SFT)
    Multi-source instruction information mixing common duties (math, programming, logic) for chain-of-thought, plus medical situations (physician–affected person Q&A, diagnostic reasoning, security/ethics) for scientific adaptation.

    Stage 03
    Reinforcement Studying through GRPO
    Group Relative Coverage Optimization with task-specific reward fashions. Shapes mannequin conduct towards empathy, structural readability, security boundaries, and evidence-based reasoning to cut back hallucinations.

    04 — Inference
    Inference Efficiency
    {Hardware} benchmarks on H20 and throughput enhancements from FP8 + EAGLE3 optimization.

    >200 tok/s
    On H20 {hardware}. Roughly 3× quicker than a comparable 36B dense mannequin.

    7× effectivity
    MoE vs. dense at equal dimension. Speedup will increase additional as output size grows.

    +71% / +45% / +94%
    FP8 + EAGLE3 throughput positive factors over FP8 alone on HumanEval / GSM8K / Math-500 at concurrency 32.

    128K context
    Supported through YaRN extrapolation. Handles full scientific paperwork and prolonged multi-turn dialogues.

    05 — Benchmarks
    Benchmark Outcomes
    Evaluated throughout three authoritative medical LLM benchmarks.

    Benchmark Scope Consequence
    HealthBenchOpenAI Simulated multi-turn medical dialogues for real-world scientific efficiency. #1 open-source; surpasses a number of proprietary fashions. Largest lead on HealthBench-Exhausting.
    MedAIBenchNat’l AI Medical Pilot Facility Chinese language authority benchmark masking data Q&A and medical ethics/security. Prime-level. Strongest in data Q&A and medical ethics/security.
    MedBenchChinese language Healthcare Area 36 datasets, ~700K samples throughout 5 scientific dimensions. #1 general throughout all 5 dimensions.

    06 — Quickstart
    Run with Hugging Face Transformers
    Requires trust_remote_code=True for the MoE routing code.

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    mannequin = AutoModelForCausalLM.from_pretrained(
        "MedAIBase/AntAngelMed",
        device_map="auto",
        trust_remote_code=True,
    )
    tokenizer = AutoTokenizer.from_pretrained("MedAIBase/AntAngelMed")
    
    messages = [
      {"role": "system", "content": "You are AntAngelMed, a helpful medical assistant."},
      {"role": "user",   "content": "What should I do if I have a headache?"}
    ]
    textual content   = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer([text], return_tensors="pt",
        return_token_type_ids=False).to(mannequin.gadget)
    out    = mannequin.generate(**inputs, max_new_tokens=16384)
    out    = [o[len(i):] for i, o in zip(inputs.input_ids, out)]
    print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])

    Additionally helps: vLLM v0.11.0 (4-GPU tensor parallel), SGLang with FlashAttention-3, and vLLM-Ascend for Huawei Ascend 910B NPUs.

    07 — Entry
    Sources & Hyperlinks
    Mannequin weights Apache 2.0 — Code repository MIT — FP8 quantized variant obtainable individually.

    Developed by Well being Info Middle of Zhejiang Province, Ant Healthcare, and Zhejiang Anzhen’er Medical AI Know-how Co., Ltd.
    Protection by Marktechpost — marktechpost.com

    Key Takeaways

    • AntAngelMed is a 103B-parameter open-source medical LLM that prompts solely 6.1B parameters at inference time utilizing a 1/32 activation-ratio MoE structure inherited from Ling-flash-2.0.
    • It makes use of a three-stage coaching pipeline: continuous pre-training on medical corpora, SFT with blended common and scientific instruction information, and GRPO-based reinforcement studying for security and diagnostic reasoning.
    • On H20 {hardware}, the mannequin exceeds 200 tokens/s and helps 128K context size through YaRN extrapolation — roughly 3× quicker than a comparable 36B dense mannequin.
    • AntAngelMed ranks first amongst open-source fashions on OpenAI’s HealthBench, surpasses a number of proprietary fashions, and tops each MedAIBench and MedBench leaderboards.
    • The mannequin is offered on Hugging Face, ModelScope, and GitHub; mannequin weights are Apache 2.0, code is MIT, and an FP8 quantized model can be launched.

    Try the Model Weights on HF, GitHub Repo and Technical details. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

    Have to associate with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Connect with us




    Source link

    Naveed Ahmad

    Naveed Ahmad is a technology journalist and AI writer at ArticlesStock, covering artificial intelligence, machine learning, and emerging tech policy. Read his latest articles.

    Related Posts

    OpenAI co-founder Greg Brockman reportedly takes cost of product technique

    16/05/2026

    The offline desk gadget that truly bought me to sit up

    16/05/2026

    Customers flip to jailbreaking their older Kindles as Amazon ends help

    16/05/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.