Meet AntAngelMed: A 103B-Parameter Open-Supply Medical Language Mannequin Constructed on a 1/32 Activation-Ratio MoE Structure

A group researchers from China have launched AntAngelMed, a big open-source medical language mannequin that the group describes as the biggest and most able to its variety presently obtainable.

What Is AntAngelMed?

AntAngelMed is a medical-domain language mannequin with 103 billion complete parameters, but it surely doesn’t activate all of these parameters throughout inference. As an alternative, it makes use of a Combination-of-Consultants (MoE) structure with a 1/32 activation ratio, that means solely 6.1 billion parameters are energetic at any given time when processing a question.

It helps to know the way MoE architectures work. In a regular dense mannequin, each parameter participates in processing each token. In an MoE mannequin, the community is split into many ‘skilled’ sub-networks, and a routing mechanism selects solely a small subset of them to deal with every enter. This lets you have a really giant complete parameter depend — which generally correlates with sturdy data capability — whereas retaining the precise compute price of inference proportional to the smaller energetic parameter depend.

AntAngelMed inherits this design from Ling-flash-2.0, a base mannequin developed by inclusionAI and guided by what the group calls Ling Scaling Legal guidelines. The particular optimizations layered on high embody: refined skilled granularity, a tuned shared skilled ratio, consideration steadiness mechanisms, sigmoid routing with out auxiliary loss, an MTP (Multi-Token Prediction) layer, QK-Norm, and Partial-RoPE (Rotary Place Embedding utilized to a subset of consideration heads relatively than all of them). In keeping with the analysis group, these design selections collectively permit small-activation MoE fashions to ship as much as 7× effectivity in comparison with equally sized dense architectures which implies with solely 6.1B activated parameters, AntAngelMed can match roughly 40B dense mannequin efficiency. Individually, as output size grows throughout inference, the relative velocity benefit can even attain 7× or extra over dense fashions of comparable dimension.

https://modelscope.cn/fashions/MedAIBase/AntAngelMed

Coaching Pipeline

AntAngelMed makes use of a three-stage coaching course of designed to layer common language understanding on high of deep medical area adaptation.

The first stage is continuous pre-training on large-scale medical corpora, together with encyclopedias, net textual content, and educational publications. This part is constructed on high of the Ling-flash-2.0 checkpoint, giving the mannequin a powerful common reasoning basis earlier than medical specialization begins.

The second stage is Supervised Nice-Tuning (SFT), the place the mannequin is educated on a multi-source instruction dataset. This dataset mixes common reasoning duties — math, programming, logic — to protect chain-of-thought capabilities, alongside medical situations similar to physician–affected person Q&A, diagnostic reasoning, and security and ethics circumstances.

The third stage is Reinforcement Studying utilizing the GRPO (Group Relative Coverage Optimization) algorithm, mixed with task-specific reward fashions. GRPO, initially launched within the DeepSeekMath paper, is a variant of PPO that estimates baselines from group scores relatively than a separate critic mannequin, making it computationally lighter. Right here, reward indicators are designed to form mannequin conduct towards empathy, structured scientific responses, security boundaries, and evidence-based reasoning — all with the purpose of lowering hallucinations on medical questions.

Inference Efficiency

On H20 {hardware}, AntAngelMed exceeds 200 tokens per second, which the analysis group reviews is roughly 3× quicker than a 36 billion parameter dense mannequin. With YaRN (But One other RoPE extensioN) extrapolation, it helps a 128K context size — lengthy sufficient to deal with full scientific paperwork, prolonged affected person histories, or multi-turn medical dialogues.

The analysis group has additionally launched an FP8 quantized model of the mannequin. When this quantization is mixed with EAGLE3 speculative decoding optimization, inference throughput at a concurrency of 32 improves considerably over FP8 alone: 71% on HumanEval, 45% on GSM8K, and 94% on Math-500. These benchmarks measure coding and math reasoning duties — not medical duties instantly — however function proxies for the mannequin’s common throughput stability throughout output varieties.

Benchmark Outcomes

On HealthBench, the open-source medical analysis benchmark from OpenAI that makes use of simulated multi-turn medical dialogues to measure real-world scientific efficiency, AntAngelMed ranks first amongst all open-source fashions and surpasses a variety of high proprietary fashions as properly, with a very vital benefit on the HealthBench-Exhausting subset.

On MedAIBench, an analysis system maintained by China’s Nationwide Synthetic Intelligence Medical Trade Pilot Facility, AntAngelMed ranks on the high stage, with notably sturdy scores in medical data Q&A and medical ethics and security classes.

On MedBench, a benchmark for Chinese language healthcare LLMs masking 36 independently curated datasets and roughly 700,000 samples throughout 5 dimensions — medical data query answering, medical language understanding, medical language technology, advanced medical reasoning, and security and ethics — AntAngelMed ranks first general.

Marktechpost’s Visible Explainer

01 — Overview
What Is AntAngelMed?
Collectively developed by Well being Info Middle of Zhejiang Province, Ant Healthcare, and Zhejiang Anzhen’er Medical AI Know-how Co., Ltd.

103BWhole Params

6.1BEnergetic at Inference

128KContext Size

AntAngelMed is a medical-domain LLM constructed on a 1/32 activation-ratio MoE structure. With 103B complete parameters and solely 6.1B energetic at inference time, it matches the efficiency of roughly 40B dense fashions at a fraction of the compute price.

Mannequin weights are launched beneath Apache 2.0. The code repository is licensed beneath MIT.

02 — Structure
MoE Structure & Base Mannequin
Constructed on Ling-flash-2.0 by inclusionAI, guided by Ling Scaling Legal guidelines.

AntAngelMed makes use of a 1/32 activation-ratio MoE with optimizations throughout all core elements. These selections allow small-activation MoE fashions to ship as much as 7× effectivity over equally sized dense architectures — and as output size grows, relative speedups can attain 7× or extra.

Key architectural elements:

Skilled Granularity
Shared Skilled Ratio
Sigmoid Routing
No Auxiliary Loss
MTP Layer
QK-Norm
Partial-RoPE
YaRN Extrapolation
Consideration Stability

03 — Coaching
Three-Stage Coaching Pipeline
Designed to layer common language understanding on high of deep medical area adaptation.

Stage 01
Continuous Pre-Coaching
Constructed on Ling-flash-2.0, educated on large-scale medical corpora — encyclopedias, net textual content, and educational publications — to inject deep area and world data.

Stage 02
Supervised Nice-Tuning (SFT)
Multi-source instruction information mixing common duties (math, programming, logic) for chain-of-thought, plus medical situations (physician–affected person Q&A, diagnostic reasoning, security/ethics) for scientific adaptation.

Stage 03
Reinforcement Studying through GRPO
Group Relative Coverage Optimization with task-specific reward fashions. Shapes mannequin conduct towards empathy, structural readability, security boundaries, and evidence-based reasoning to cut back hallucinations.

04 — Inference
Inference Efficiency
{Hardware} benchmarks on H20 and throughput enhancements from FP8 + EAGLE3 optimization.

>200 tok/s
On H20 {hardware}. Roughly 3× quicker than a comparable 36B dense mannequin.

7× effectivity
MoE vs. dense at equal dimension. Speedup will increase additional as output size grows.

+71% / +45% / +94%
FP8 + EAGLE3 throughput positive factors over FP8 alone on HumanEval / GSM8K / Math-500 at concurrency 32.

128K context
Supported through YaRN extrapolation. Handles full scientific paperwork and prolonged multi-turn dialogues.

05 — Benchmarks
Benchmark Outcomes
Evaluated throughout three authoritative medical LLM benchmarks.

Benchmark	Scope	Consequence
HealthBenchOpenAI	Simulated multi-turn medical dialogues for real-world scientific efficiency.	#1 open-source; surpasses a number of proprietary fashions. Largest lead on HealthBench-Exhausting.
MedAIBenchNat’l AI Medical Pilot Facility	Chinese language authority benchmark masking data Q&A and medical ethics/security.	Prime-level. Strongest in data Q&A and medical ethics/security.
MedBenchChinese language Healthcare Area	36 datasets, ~700K samples throughout 5 scientific dimensions.	#1 general throughout all 5 dimensions.

06 — Quickstart
Run with Hugging Face Transformers
Requires trust_remote_code=True for the MoE routing code.

from transformers import AutoModelForCausalLM, AutoTokenizer

mannequin = AutoModelForCausalLM.from_pretrained(
    "MedAIBase/AntAngelMed",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("MedAIBase/AntAngelMed")

messages = [
  {"role": "system", "content": "You are AntAngelMed, a helpful medical assistant."},
  {"role": "user",   "content": "What should I do if I have a headache?"}
]
textual content   = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt",
    return_token_type_ids=False).to(mannequin.gadget)
out    = mannequin.generate(**inputs, max_new_tokens=16384)
out    = [o[len(i):] for i, o in zip(inputs.input_ids, out)]
print(tokenizer.batch_decode(out, skip_special_tokens=True)[0])

Additionally helps: vLLM v0.11.0 (4-GPU tensor parallel), SGLang with FlashAttention-3, and vLLM-Ascend for Huawei Ascend 910B NPUs.

07 — Entry
Sources & Hyperlinks
Mannequin weights Apache 2.0 — Code repository MIT — FP8 quantized variant obtainable individually.

Developed by Well being Info Middle of Zhejiang Province, Ant Healthcare, and Zhejiang Anzhen’er Medical AI Know-how Co., Ltd.
Protection by Marktechpost — marktechpost.com

Key Takeaways

AntAngelMed is a 103B-parameter open-source medical LLM that prompts solely 6.1B parameters at inference time utilizing a 1/32 activation-ratio MoE structure inherited from Ling-flash-2.0.
It makes use of a three-stage coaching pipeline: continuous pre-training on medical corpora, SFT with blended common and scientific instruction information, and GRPO-based reinforcement studying for security and diagnostic reasoning.
On H20 {hardware}, the mannequin exceeds 200 tokens/s and helps 128K context size through YaRN extrapolation — roughly 3× quicker than a comparable 36B dense mannequin.
AntAngelMed ranks first amongst open-source fashions on OpenAI’s HealthBench, surpasses a number of proprietary fashions, and tops each MedAIBench and MedBench leaderboards.
The mannequin is offered on Hugging Face, ModelScope, and GitHub; mannequin weights are Apache 2.0, code is MIT, and an FP8 quantized model can be launched.

Try the Model Weights on HF, GitHub Repo and Technical details. Additionally, be happy to observe us on Twitter and don’t overlook to hitch our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Have to associate with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so forth.? Connect with us

Source link

Meet AntAngelMed: A 103B-Parameter Open-Supply Medical Language Mannequin Constructed on a 1/32 Activation-Ratio MoE Structure

OpenAI co-founder Greg Brockman reportedly takes cost of product technique

The offline desk gadget that truly bought me to sit up

Customers flip to jailbreaking their older Kindles as Amazon ends help

Meet AntAngelMed: A 103B-Parameter Open-Supply Medical Language Mannequin Constructed on a 1/32 Activation-Ratio MoE Structure

What Is AntAngelMed?

Coaching Pipeline

Inference Efficiency

Benchmark Outcomes

Marktechpost’s Visible Explainer

Key Takeaways

Related Posts

OpenAI co-founder Greg Brockman reportedly takes cost of product technique

The offline desk gadget that truly bought me to sit up

Customers flip to jailbreaking their older Kindles as Amazon ends help