[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.
-
Updated
Sep 27, 2025 - Python
[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
A comprehensive collection of process reward models.
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).
[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
This repository includes code and materials for the paper "Efficient PRM Training Data Synthesis via Formal Verification" (ACL 2026 Findings).
Can We Predict Before Executing Machine Learning Agents?
[KDD 2025] Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners
This repository contains the official implementation of Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision.
Multi-Agent System Process Reward Model (MASPRM): a lightweight process reward model guiding multi-agent systems at search time.
Process-supervised RL for a multi-step reasoning agent — DAPO + a learned Process Reward Model (PRM) training a Qwen3-8B Planner. A modern, from-scratch rebuild of the AgentFlow paper (ICLR 2026).
Step-level preference data construction toolkit for code-agent process reward models
Live proof of arXiv:2603.17815 — O(N) confirmed R²=0.952, 1,984 API calls
Interface for DeepSeek-R1 chain-of-thought model — exposes the internal reasoning trace, supports API and local Ollama backends, with streaming output and temperature control.
Streaming, bounded-memory Process Reward Models over SSM / linear-attention backbones (codename: lattica).
Add a description, image, and links to the process-reward-model topic page so that developers can more easily learn about it.
To associate your repository with the process-reward-model topic, visit your repo's landing page and select "manage topics."