process-reward-model

Here are 17 public repositories matching this topic...

Gen-Verse / ReasonFlux

[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.

reinforcement-learning code-generation post-training chain-of-thought llm-rlhf gemini-pro sft-data process-reward-model deepseek-r1 o3-mini clawdbot-skill

Updated Sep 27, 2025
Python

RyanLiu112 / compute-optimal-tts

Star

Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".

r1 o1 large-language-model process-reward-model test-time-scaling

Updated Feb 19, 2025
Python

RyanLiu112 / Awesome-Process-Reward-Models

Star

A comprehensive collection of process reward models.

r1 o1 large-language-model process-reward-model

Updated Jun 6, 2026

RyanLiu112 / GenPRM

Star

[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".

r1 o1 large-language-model process-reward-model test-time-scaling

Updated Nov 8, 2025
Python

qiqihezh / agentic-grpo-longhorizon

Star

Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).

reinforcement-learning long-horizon qwen agentic-ai tool-calling process-reward-model grpo tau-bench multi-turn-agents

Updated May 11, 2026
Python

BaohaoLiao / RSD

Star

[ICML 2025] Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

efficiency reasoning decoding-algorithm large-language-models speculative-decoding process-reward-model

Updated May 2, 2025
Python

sdiehl / prm

Star

Library for training process reward models

prm process-reward-model

Updated Jun 3, 2025
Python

psunlpgroup / FoVer

Star

This repository includes code and materials for the paper "Efficient PRM Training Data Synthesis via Formal Verification" (ACL 2026 Findings).

large-language-models process-reward-model

Updated Apr 7, 2026
Python

zjunlp / predict-before-execute

Star

Can We Predict Before Executing Machine Learning Agents?

agent machine-learning natural-language-processing rollout mlagent large-language-models process-reward-model

Updated Jun 1, 2026
Python

Graph-Reasoner / GraphPRM

Star

[KDD 2025] Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

llm-reasoning process-reward-model graph-reasoning

Updated May 30, 2025
Python

declare-lab / PathFinder-PRM

Star

This repository contains the official implementation of Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision.

reinforcement-learning reasoning process-reward-model

Updated May 27, 2025
Jupyter Notebook

milad1378yz / MASPRM

Star

Multi-Agent System Process Reward Model (MASPRM): a lightweight process reward model guiding multi-agent systems at search time.

mcts tree-search multi-agent-system llm process-reward-model inference-time-reasoning

Updated May 6, 2026
Python

awesome-pro / agentflow-pro

Sponsor

Star

Process-supervised RL for a multi-step reasoning agent — DAPO + a learned Process Reward Model (PRM) training a Qwen3-8B Planner. A modern, from-scratch rebuild of the AgentFlow paper (ICLR 2026).

machine-learning reinforcement-learning ai-agents trl llm rlhf reward-model qlora ollama qwen llm-fine-tuning unsloth agentic-ai process-reward-model grpo dapo

Updated Jun 2, 2026
Python

Drnaive / CodePRM-DataKit

Star

Step-level preference data construction toolkit for code-agent process reward models

llm rlhf reward-modeling process-reward-model code-agent swe-bench

Updated May 18, 2026
Python

originaonxi / prm-replication

Star

Live proof of arXiv:2603.17815 — O(N) confirmed R²=0.952, 1,984 API calls

replication monte-carlo reasoning llm process-reward-model

Updated Mar 19, 2026
Python

Devanik21 / DeepSeek-R1

Star

Interface for DeepSeek-R1 chain-of-thought model — exposes the internal reasoning trace, supports API and local Ollama backends, with streaming output and temperature control.

deep-learning neural-networks reasoning large-language-models llm rlhf reinforcement-learning-from-human-feedback chain-of-thought-reasoning deepseek process-reward-model

Updated Mar 15, 2026

hinanohart / prmstream

Star

Streaming, bounded-memory Process Reward Models over SSM / linear-attention backbones (codename: lattica).

verification mamba reasoning state-space-model chain-of-thought process-reward-model

Updated Jun 4, 2026
Python

Improve this page

Add a description, image, and links to the process-reward-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the process-reward-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

process-reward-model

Here are 17 public repositories matching this topic...

Gen-Verse / ReasonFlux

RyanLiu112 / compute-optimal-tts

RyanLiu112 / Awesome-Process-Reward-Models

RyanLiu112 / GenPRM

qiqihezh / agentic-grpo-longhorizon

BaohaoLiao / RSD

sdiehl / prm

psunlpgroup / FoVer

zjunlp / predict-before-execute

Graph-Reasoner / GraphPRM

declare-lab / PathFinder-PRM

milad1378yz / MASPRM

awesome-pro / agentflow-pro

Drnaive / CodePRM-DataKit

originaonxi / prm-replication

Devanik21 / DeepSeek-R1

hinanohart / prmstream

Improve this page

Add this topic to your repo