LogoAISecKit
icon of llm_dataset_generation

llm_dataset_generation

A tool using large language models to generate and review training data for logistics-related queries.

Introduction

Introduction

This GitHub repository provides tools for generating training data using large language models (LLMs) specifically for logistics applications. The main objective is to assist a logistics company in creating an interactive Q&A bot that can classify user queries related to logistics.

Key Features
  • Data Generation: Using DeepSeek-V3, it generates diverse datasets relevant to logistics and non-logistics queries.
  • Data Review: Implements DeepSeek-R1 to verify and check the generated datasets for quality and classification accuracy.
  • Python-Based: The tools are implemented in Python, making them accessible and modifiable for developers.
Benefits
  • Automation: Reduces the hassle of manual data collection and labeling by automating the data generation process.
  • Quality Assurance: Ensures that the generated datasets are relevant and accurate through robust data review mechanisms.
Highlights
  • Specifically designed for logistics-related query classification.
  • Simplifies the data preparation phase for building AI applications in logistics.
  • Open-source project encouraging contributions and community support.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates