TripoSG
TripoSG is an advanced high-fidelity, high-quality, and high-generalizability image-to-3D generation foundation model. It leverages large-scale rectified flow transformers, hybrid supervised training, and a high-quality dataset to achieve state-of-the-art performance in 3D shape generation.
Key Features:
- High-Fidelity Generation: Produces meshes with sharp geometric features, fine surface details, and complex structures.
- Semantic Consistency: Generated shapes accurately reflect input image semantics and appearance.
- Strong Generalization: Handles diverse input styles, including photorealistic images, cartoons, and sketches.
- Robust Performance: Creates coherent shapes even for challenging inputs with complex topology.
- Advanced VAE Architecture: Uses Signed Distance Functions (SDFs) with hybrid supervision combining SDF loss, surface normal guidance, and eikonal loss.
Technical Highlights:
- Large-Scale Rectified Flow Transformer: Combines RF's linear trajectory modeling with transformer architecture for stable, efficient training.
- High-Quality Dataset: Trained on 2 million meticulously curated Image-SDF pairs, ensuring superior output quality.
- Efficient Scaling: Implements architecture optimizations for high performance even at smaller model scales.
Community & Support:
Contributions are welcome! Use GitHub Issues for bug reports and feature requests, and feel free to contribute to this open-source project.