Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Put up-Educated on Qwen3-14B by way of Reinforcement Studying

**Nous Research Unveils Intelligent Coding Model: NousCoder-14B**

Research has just unveiled the latest innovation in coding: NousCoder-14B, an aggressive programming model that has taken the world of coding by storm. This remarkable model has been post-trained on Qwen3-14B using reinforcement learning, resulting in a remarkable 67.87% Cross@1 accuracy on the LiveCodeBench v6 benchmark. But what exactly does this mean?

**Unraveling Cross@1 Accuracy**

Cross@1 is a metric that measures the percentage of problems where the first generated program passes all hidden input-output checks, including time and memory constraints. In other words, it’s a measure of how well the model can generate correct and efficient code. On the LiveCodeBench v6 benchmark, which consists of 454 issues, NousCoder-14B has achieved a stunning 7.08% increase in share points over the Qwen3-14B baseline.

**The Power of Reinforcement Learning**

Reinforcement learning (RL) is a powerful approach that allows the model to learn from its mistakes and improve over time. In this case, the model is trained on a dataset of 24,000 verifiable coding issues from various sources, including TACO Verified, PrimeIntellect SYNTHETIC 1, and pre-July 31, 2024 LiveCodeBench duties. The model is then evaluated on a separate test set of 454 issues from August 1, 2024, to May 1, 2025.

**Atropos and Modal: A Winning Combination**

The RL environment is built using the Atropos framework, which allows the model to generate Python code for each problem. Each rollout receives a scalar reward that depends on the test case outcomes. The research team uses Modal as an autoscaled sandbox to execute untrusted code safely and at scale. This combination of Atropos and Modal enables the model to learn and improve its coding abilities.

**Group Relative Coverage Optimization: The Key to Success**

The model uses Group Relative Coverage Optimization (GRPO), which is a novel approach that doesn’t require a separate value model. GRPO is combined with three targets: Dynamic Sampling Coverage Optimization (DAPO), Group Sequence Coverage Optimization (GSPO), and GSPO+. These targets are designed to improve the model’s ability to generate high-quality code.

**Iterative Context Extension and Overlong Filtering**

The training process involves iterative context extension, where the model is first trained with a 32k context window and then continues training on the maximum Qwen3-14B context window of 40k. Overlong filtering is also used to prevent the model from generating excessively long code.

**Key Takeaways**

NousCoder-14B is a groundbreaking coding model that has achieved impressive results in the LiveCodeBench v6 benchmark. With its combination of RL, Atropos, and Modal, the model has the potential to transform the way we approach coding. Whether you’re a researcher or a developer, NousCoder-14B is definitely worth exploring.

Stay tuned for more updates on this exciting development, and don’t forget to follow us on Twitter and join our ML SubReddit for the latest news and insights in the world of AI and machine learning.

Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Put up-Educated on Qwen3-14B by way of Reinforcement Studying

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference

Kalshi wins short-term pause in Arizona felony case

Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Put up-Educated on Qwen3-14B by way of Reinforcement Studying

Related Posts

Walmart-owned Flipkart, Amazon are squeezing India’s fast commerce startups

Liquid AI Releases LFM2.5-VL-450M: a 450M-Parameter Imaginative and prescient-Language Mannequin with Bounding Field Prediction, Multilingual Assist, and Sub-250ms Edge Inference

Kalshi wins short-term pause in Arizona felony case