LogoAISecKit

pdfdeal

A python wrapper for the Doc2X API that enhances PDF processing and recall in RAG applications.

Introduction

pdfdeal

pdfdeal is a Python wrapper for the Doc2X API, designed to simplify PDF processing and enhance recall in Retrieval-Augmented Generation (RAG) applications. It provides native text processing capabilities, allowing users to convert and manage PDF documents efficiently.

Key Features:
  • Doc2X API Integration: Seamlessly integrates with the Doc2X API for powerful document conversion.
  • PDF Processing: Process all PDF files in a specified folder or individual files with ease.
  • Markdown Document Handling: Offers tools for converting HTML tables to Markdown, uploading images, and managing document structure.
  • Enhanced Recognition Rates: Improves recognition rates when used with knowledge base applications like graphrag, Dify, and FastGPT.
  • Installation: Easily installable via pip with options for additional document processing tools.
Benefits:
  • User-Friendly: Simplifies complex PDF processing tasks with a straightforward API.
  • Versatile: Supports various output formats and document enhancements, making it suitable for diverse applications.
  • Open Source: Available on GitHub, allowing for community contributions and improvements.

For detailed usage instructions and documentation, visit the pdfdeal documentation repository.

Information

  • Publisher
    AISecKit
  • Websitegithub.com
  • Published date2025/04/28

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates