AI2 Releases SERA, Gentle Verified Coding Brokers Constructed with Supervised Coaching Just for Sensible Repository Degree Automation Workflows

Hey there, fellow tech enthusiasts! I’m super excited to share with you the latest innovation from the Allen Institute for AI (AI2) – SERA, a coding agent designed to revolutionize repository-level automation. And the best part? It was created using solely supervised training and artificially generated trajectories!

But before we dive into the nitty-gritty, let’s cover the basics. What is SERA, you ask? It’s the first release in AI2’s Open Coding Agents series, and the flagship model, SERA-32B, is built on the Qwen 3 32B architecture and trained as a repository-level coding agent.

So, how does it perform? SERA-32B has been tested on SWE bench Verified at 32K context and reaches a whopping 49.5% resolve rate. At 64K context, it reaches a 54.2% resolve rate. These numbers put it in the same efficiency band as open-weight models like Devstral-Small-2 and GLM-4.5 Air, while SERA remains entirely open in code, data, and weights.

But what’s really cool is the technology behind SERA – Gentle-Verified Technology (SVG). SVG produces agent trajectories that resemble real developer workflows and uses patch agreement between two rollouts as a gentle signal of correctness. Here’s how it works:

1. The system samples a function from an actual repository and receives a bug style or change description. The trainer model, GLM-4.6 in the SERA-32B setup, operates with tools to view files, edit code, and run commands. It produces a trajectory T1 and a patch P1.
2. The system converts the trajectory into a pull request-like description. This text summarizes the intent and key edits in a format similar to actual pull requests.
3. The trainer starts again from the original repository but now only sees the pull request description and the tools. It produces a new trajectory T2 and patch P2 that tries to implement the described change.
4. The patches P1 and P2 are compared line by line. A recall score r is computed as the fraction of modified lines in P1 that appear in P2. When r equals 1, the trajectory is heavily verified. For intermediate values, the pattern is gentle verified.

So, what are the key takeaways from SERA? Well, for starters, SERA turns coding agents into supervised learning, training on artificial trajectories from GLM-4.6 with no reinforcement learning loop and no dependency on repository test suites. Additionally, Gentle-Verified Technology removes the need for tests, using two rollouts and patch overlap between P1 and P2 to compute a gentle verification score.

The team has also shared a large, practical agent dataset from actual repositories, producing over 200,000 trajectories and creating one of the largest open datasets for coding agents. And, with efficient training and specific cost and scaling analysis, SERA-32B trains on 25,000 T2 trajectories, with the scaling experiment revealing that SVG is about 26 times cheaper than SkyRL-Agent and 57 times cheaper than SWE-smith at similar SWE bench Verified performance.

What’s next? The team has shared the paper, repo, and model weights, so you can dive deeper into the technology. Follow us on Twitter, join our 100k+ ML SubReddit, and Subscribe to our Publication. And, if you’re on Telegram, you can join our community there too!

AI2 Releases SERA, Gentle Verified Coding Brokers Constructed with Supervised Coaching Just for Sensible Repository Degree Automation Workflows

From LLMs to hallucinations, this is a easy information to widespread AI phrases

Slate Auto: Every part it’s essential to know concerning the Bezos-backed EV startup

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2

AI2 Releases SERA, Gentle Verified Coding Brokers Constructed with Supervised Coaching Just for Sensible Repository Degree Automation Workflows

Related Posts

From LLMs to hallucinations, this is a easy information to widespread AI phrases

Slate Auto: Every part it’s essential to know concerning the Bezos-backed EV startup

MiniMax Simply Open Sourced MiniMax M2.7: A Self-Evolving Agent Mannequin that Scores 56.22% on SWE-Professional and 57.0% on Terminal Bench 2