Awesome LLM Pre-training

Technical Reports: A curated list of important technical papers related to LLMs.
Training Strategies: Overview of different training frameworks, strategies, and improvements in model architecture.
Open-source Datasets: A collection of datasets that are freely available for use in LLM training.
Data Methods: Insights into tokenization techniques and data augmentation methods that enhance training effectiveness.

Awesome LLM pre-training resources, including data, frameworks, and methods.