Search
Collection
Category
Tag
Blog
Pricing
Submit

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates

Email

AISecKit

Curated AI security tools & LLM safety resources for cybersecurity professionals

Product

Search
Collection
Category
Tag

Resources

Blog
Pricing
Submit

Tools

🔥Marathons Tools

Company

About Us
Privacy Policy
Terms of Service
Sitemap

Copyright © 2025 All Rights Reserved.

Home
Category
R1-Searcher

R1-Searcher

R1-searcher incentivizes search capability in LLMs using reinforcement learning for enhanced reasoning performance.

image for R1-Searcher

Introduction

Information

Publisher
AISecKit
Websitegithub.com
Published date2025/04/28

Categories

AI Models
AI Application Platforms
AI Research Papers

Tags

Reinforcement Learning
AI Reasoning
Open Source

More Products

prompt.fail

Explore prompt injection techniques in large language models (LLMs), providing examples to improve LLM security and robustness.

Prompt Injection Model Robustness Compliance Risk Assessment Security Frameworks+1

Learn Prompt Hacking

The most comprehensive prompt hacking course available, focusing on prompt engineering and security.

Prompt Engineering AI Ethics Generative AI Security Best Practices LLM Security

LangKit

An open-source toolkit for monitoring Large Language Models (LLMs) with features like text quality and sentiment analysis.

Prompt Injection Model Robustness Security Auditing Open Source LLM

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

R1-searcher is a project aimed at enhancing the reasoning capabilities of large reasoning models (LRMs) through a two-stage outcome-supervision reinforcement learning approach. This innovative method allows models to learn how to invoke web search and effectively utilize search engines during reasoning processes, addressing the limitations of knowledge-intensive problems.

Key Features:

Two-Stage Learning: The model first learns to invoke search and then to solve questions using the search results.
No Instruction Fine-Tuning Required: Compatible with existing Base LLMs or Chat LLMs without the need for complex fine-tuning.
Outcome-Supervised Reinforcement Learning: Focuses on the design of rewards and the reinforcement learning algorithm to enhance performance.
Diverse Training Data: Utilizes datasets like HotpotQA and 2WikiMultiHopQA for robust training and evaluation.
Integration of Online Search: Incorporates online search capabilities to improve results, especially for recent knowledge.

Benefits:

Improved Reasoning Performance: Achieves significant improvements over existing methods, even surpassing some closed-source models.
Generalization Capabilities: Demonstrates exceptional performance across in-domain and out-of-domain datasets.
Open Source: Provides access to training code, inference code, model checkpoints, and a detailed technical report for community use and further research.

Highlights:

Achieved better results with Qwen-2.5-7B-Base and LLaMA-3.1-8B-Instruct models.
Utilizes a structured reward system to guide the learning process effectively.
Open-source resources available for researchers and developers to build upon this work.