LogoAISecKit
icon of NiuTrans/Classical-Modern

NiuTrans/Classical-Modern

A comprehensive parallel corpus of Classical Chinese and Modern Chinese texts.

Introduction

NiuTrans/Classical-Modern

NiuTrans/Classical-Modern is a comprehensive parallel corpus of Classical Chinese (古文) and Modern Chinese texts. This project aims to provide a rich resource for researchers and developers working with Chinese language processing.

Key Features:
  • Extensive Corpus: Contains a vast collection of Classical Chinese texts, covering 327 classical works.
  • Parallel Data: Offers sentence-level aligned bilingual data, with a total of 972,467 sentence pairs.
  • Structured Organization: Texts are organized by chapters and sections, making it easy to navigate and access specific works.
  • Data Processing Scripts: Provides scripts for data processing, ensuring reproducibility and ease of use.
Benefits:
  • Research Resource: Ideal for linguists, historians, and AI researchers interested in Chinese language studies.
  • Open Source: Freely available for contributions and improvements from the community.
  • Detailed Documentation: Includes comprehensive documentation on data sources and processing methods.
Highlights:
  • The corpus is sourced from the internet, ensuring a wide range of texts.
  • All data is meticulously organized to maintain the original order of the Classical texts.
  • Contributions from community members enhance the quality and breadth of the corpus.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates