Kreuzberg

Introduction

Kreuzberg is a Python library designed for efficient text extraction from various document formats, including PDFs, images, and office documents. It provides a unified interface for both synchronous and asynchronous text extraction, making it versatile for different use cases.

Key Features:

Wide Format Support: Extract text from PDFs, DOCX, RTF, TXT, EPUB, and more.
Multiple OCR Engines: Supports Tesseract, EasyOCR, and PaddleOCR for optimal text recognition.
Local Processing: No need for external API calls or cloud dependencies, ensuring privacy and speed.
Resource Efficient: Lightweight processing without GPU requirements.
Metadata Extraction: Retrieve document metadata alongside the extracted text.
Table Extraction: Utilize the GMFT library for extracting tables from documents.
Modern Python: Built with async/await, type hints, and a functional-first approach.

Benefits:

Simple and Hassle-Free: Clean API that just works without complex configuration.
Open Source: Released under the MIT license, encouraging contributions and community involvement.

Getting Started:

To install Kreuzberg, use the following command:

pip install kreuzberg

For comprehensive documentation, visit our GitHub Pages.

Kreuzberg

Introduction

Kreuzberg

Key Features:

Benefits:

Getting Started:

Information

Categories

Tags

More Products

Nano Bananary

Twocast

ZCF

Kreuzberg

Introduction

Kreuzberg

Key Features:

Benefits:

Getting Started:

Information

Categories

Tags

More Products

Nano Bananary

Twocast

ZCF

Newsletter

Join the Community