LogoAISecKit
icon of EuroBERT

EuroBERT

EuroBERT is a multilingual encoder model designed for European languages, trained using the Optimus training library.

Introduction

EuroBERT: Scaling Multilingual Encoders for European Languages

EuroBERT is a multilingual encoder model specifically designed for European languages, leveraging the Optimus training library for efficient training across various hardware configurations, including CPU, AMD, and NVIDIA GPUs.

Key Features:
  • Hardware Agnostic: Seamlessly train on CPU, AMD, or NVIDIA hardware.
  • Resumable Training: Continue training regardless of hardware or environment changes.
  • Scalable Distributed Training: Supports Fully Sharded Data Parallel (FSDP), Distributed Data Parallel (DDP), and other parallelism strategies.
  • Comprehensive Data Processing: Includes utilities for tokenization, packing, subsampling, and dataset inspection.
  • Highly Customizable: Fine-tune model architecture, training, and data processing with extensive configuration options.
  • Performance Optimizations: Implements advanced techniques like mixed precision training, fused operations, and optimizations such as Liger Kernel and Flash Attention.
Benefits:
  • Efficiently process and train multilingual datasets.
  • Flexible installation options for developers.
  • Extensive documentation and tutorials available for users.
Highlights:
  • Supports a wide range of configurations for different training scenarios.
  • Provides a fair and consistent framework for evaluating and comparing encoder models.

Newsletter

Join the Community

Subscribe to our newsletter for the latest news and updates