UniTok: A Unified Tokenizer for Visual Generation and Understanding
Key Features:
- Unified visual tokenizer compatible with autoregressive generative and multimodal understanding models.
- Implements a state-of-the-art MLLM within the Liquid framework, enhancing performance in both generation and understanding tasks.
- The repository includes installation instructions, model weights, and inference capabilities.
Benefits:
- Boosts performance across unified MLLMs by integrating advanced features such as improved attention mechanisms.
- Open to community contributions, feedback, and continuous improvements.
- Published research results support its effectiveness on visual comprehension benchmarks.
Highlights:
- Gradio demo available on Huggingface.
- Comprehensive training setups for various tasks, including data preparation for evaluation.
- MIT licensed for open-source collaboration.