LogoAISecKit
icon of AutoDidact

AutoDidact

Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.

Introduction

AutoDidact
Key Features:
  • Self-Bootstrapping with Llama-8B: Generates meaningful question-answer pairs and trains itself for effective searches.
  • Autonomous Self-Verification: The Llama-8B model evaluates its answers, fostering a self-improving loop.
  • GRPO Reinforcement Learning: Uses Group Relative Policy Optimization to enhance research and reasoning capabilities.
  • Fully Autonomous Pipeline: All processes, including question generation and reinforcement learning, run locally with open-source models.
Benefits:
  • Significant improvement in answering capabilities demonstrated, e.g., from 23% to 59% accuracy in a validation set.
  • Learn to issue well-formed queries and effectively refine searches through training.
Highlights:
  • Built on Unsloth's Efficient GRPO code with enhancements for function calling and agentic loops.
  • Ideal for deploying models in research scenarios, especially with historical data or customized datasets.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates