Hugging Face Releases ml-intern: An Open-Supply AI Agent that Automates the LLM Put up-Coaching Workflow

Hugging Face has launched ml-intern, an open-source AI agent designed to automate end-to-end post-training workflows for giant language fashions (LLMs). Constructed on the corporate’s smolagents framework, the software can autonomously carry out literature evaluation, dataset discovery, coaching script execution, and iterative analysis — duties that sometimes require important guide effort from ML researchers and engineers.

What ml-intern Does

The agent operates as a steady loop that mirrors the workflow of an ML researcher. It begins by searching arXiv and Hugging Face Papers, studying methodology sections and traversing quotation graphs to determine related datasets and methods. It then searches the Hugging Face Hub for referenced datasets, inspects their high quality, and reformats them for coaching. When native compute is unavailable, the agent can launch jobs by way of Hugging Face Jobs. After every coaching run, it reads analysis outputs, diagnoses failures — similar to reward collapse in RLHF pipelines — and retrains till benchmark efficiency improves.

Your complete monitoring stack depends on Trackio, a Hub-native experiment tracker positioned as an open-source different to Weights & Biases.

Efficiency on PostTrainBench

ml-intern was evaluated in opposition to PostTrainBench, a benchmark launched by researchers on the College of Tübingen and the Max Planck Institute. The benchmark checks an agent’s potential to post-train a base mannequin inside a strict 10-hour window on a single H100 GPU.

Within the official launch demo, ml-intern took the Qwen3-1.7B base mannequin—which scores a baseline of roughly 10% on GPQA—and pushed it to 32% in below 10 hours. The agent’s progress was remarkably quick, crossing the 27.5% mark in simply over 3 hours.

This result’s significantly important when in comparison with the present SOTA. Hugging Face’s information exhibits the agent outperforming Claude Code, which presently sits at a 22.99% benchmark on the identical job. Whereas the broader PostTrainBench paper recorded a excessive of 33% utilizing the bigger Gemma-3-4B, ml-intern’s potential to extract 32% from the tiny 1.7B Qwen mannequin demonstrates a degree of “data-efficiency” that guide researchers typically wrestle to duplicate in such a brief timeframe.

Technical Approaches: Artificial Knowledge and GRPO

Two technical methods that ml-intern demonstrated in printed demos are price highlighting for practitioners.

Artificial information era: In a healthcare-domain check, the agent assessed out there medical datasets, decided their high quality was inadequate for dependable fine-tuning, and wrote a script to generate artificial coaching examples targeted on edge circumstances together with medical hedging language and multilingual emergency response eventualities. It then upsampled this information to reinforce the coaching distribution earlier than evaluating on HealthBench.

Autonomous RLHF by way of GRPO: In a math-domain check, the agent applied a Group Relative Coverage Optimization (GRPO) coaching script — a method that performs reinforcement studying from human suggestions with decrease reminiscence overhead than normal PPO. The agent launched coaching on A100 GPUs, monitored reward curves, and ran ablations to isolate efficient parts earlier than finalizing the checkpoint.

Key Takeaways

Autonomous Analysis Loop: The agent replicates the total machine studying workflow, from performing literature opinions on arXiv and traversing quotation graphs to autonomously executing coaching runs and diagnosing failures.
Important Reasoning Beneficial properties: In lower than 10 hours, the agent pushed a Qwen3-1.7B mannequin’s scientific reasoning rating on the GPQA benchmark from 8.5% to 32%, outperforming the particular GPQA outcomes of Claude Code (22.99%).
Superior Coaching Methods: Past easy fine-tuning, ml-intern can generate high-quality artificial information for edge circumstances and implement complicated methods like Group Relative Coverage Optimization (GRPO) to optimize math efficiency.
Native Ecosystem Integration: Constructed on the smolagents framework, the software natively integrates with Hugging Face Jobs for compute and makes use of Trackio for open-source experiment monitoring.

Introducing ml-intern, the agent that simply automated the post-training crew @huggingface

It is an open-source implementation of the actual analysis loop that our ML researchers do every single day. You give it a immediate, it researches papers, goes via citations, implements concepts in GPU… pic.twitter.com/USLWv6lKz9

— Aksel (@akseljoonas) April 21, 2026

Try the App, and CLI. Additionally, be happy to observe us on Twitter and don’t neglect to hitch our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Have to accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and many others.? Connect with us

Source link

Hugging Face Releases ml-intern: An Open-Supply AI Agent that Automates the LLM Put up-Coaching Workflow

Bret Taylor’s Sierra buys YC-backed AI startup Fragment

These are the nations transferring to ban social media for kids

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Reaching 88% Goodput Below Excessive {Hardware} Failure Charges

Hugging Face Releases ml-intern: An Open-Supply AI Agent that Automates the LLM Put up-Coaching Workflow

What ml-intern Does

Efficiency on PostTrainBench

Technical Approaches: Artificial Knowledge and GRPO

Key Takeaways

Related Posts

Bret Taylor’s Sierra buys YC-backed AI startup Fragment

These are the nations transferring to ban social media for kids

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Reaching 88% Goodput Below Excessive {Hardware} Failure Charges