Creek
Creek is a GitHub repository focused on training tokenizers for generative models, model initialization, pre-training, and instruction fine-tuning. It supports models like Llama and provides scripts for various stages of model training.
Key Features
- Tokenizer Training: Efficiently train tokenizers for generative models.
- Model Initialization: Initialize models with ease using provided scripts.
- Pre-training: Pre-train models with customizable parameters to suit different hardware capabilities.
- Instruction Fine-tuning: Fine-tune models based on specific instructions to enhance performance.
Benefits
- Open Source: Contribute and collaborate with the community.
- Comprehensive Documentation: Detailed instructions and scripts for each process.
- Scalable: Adaptable to various hardware configurations, ensuring accessibility for different users.
Highlights
- Supports advanced models like Llama.
- Provides scripts for training, evaluation, and fine-tuning.
- Active community contributions and updates.