LogoAISecKit
icon of R1-Searcher

R1-Searcher

R1-searcher incentivizes search capability in LLMs using reinforcement learning for enhanced reasoning performance.

Introduction

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

R1-searcher is a project aimed at enhancing the reasoning capabilities of large reasoning models (LRMs) through a two-stage outcome-supervision reinforcement learning approach. This innovative method allows models to learn how to invoke web search and effectively utilize search engines during reasoning processes, addressing the limitations of knowledge-intensive problems.

Key Features:
  • Two-Stage Learning: The model first learns to invoke search and then to solve questions using the search results.
  • No Instruction Fine-Tuning Required: Compatible with existing Base LLMs or Chat LLMs without the need for complex fine-tuning.
  • Outcome-Supervised Reinforcement Learning: Focuses on the design of rewards and the reinforcement learning algorithm to enhance performance.
  • Diverse Training Data: Utilizes datasets like HotpotQA and 2WikiMultiHopQA for robust training and evaluation.
  • Integration of Online Search: Incorporates online search capabilities to improve results, especially for recent knowledge.
Benefits:
  • Improved Reasoning Performance: Achieves significant improvements over existing methods, even surpassing some closed-source models.
  • Generalization Capabilities: Demonstrates exceptional performance across in-domain and out-of-domain datasets.
  • Open Source: Provides access to training code, inference code, model checkpoints, and a detailed technical report for community use and further research.
Highlights:
  • Achieved better results with Qwen-2.5-7B-Base and LLaMA-3.1-8B-Instruct models.
  • Utilizes a structured reward system to guide the learning process effectively.
  • Open-source resources available for researchers and developers to build upon this work.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates