Wan2.1-T2V-14B-CausVid

Overview

Wan2.1-T2V-14B-CausVid is an advanced text-to-video generation model built upon the Wan2.1-T2V-14B foundation, enhanced with CausVid's causal diffusion approach. This integration enables the model to generate high-quality, temporally consistent videos from text prompts. By leveraging causal diffusion, our model excels at producing coherent long-form videos through an autoregressive generation process, addressing the temporal consistency limitations commonly found in traditional diffusion models. This approach also allows the model to generate videos with significantly fewer inference steps, substantially reducing video generation time while maintaining high quality outputs.

Training

Our training code is modified based on the CausVid repository. We extended support for the Wan2.1-14B-T2V model and performed a 9-step distillation process. For training data, we utilized the OpenSora dataset, selecting approximately 15,000 prompt-video pairs with 81 frames each. The modified code is available at CausVid-Plus.

Inference

Our inference framework utilizes lightx2v, a highly efficient inference engine that supports multiple models. This framework significantly accelerates the video generation process while maintaining high quality output.

The number of video segments (approximately 5 seconds each) can be adjusted by modifying the "num_fragments" parameter in the configuration file. To perform autoregressive inference, simply specify the "lightx2v_path" and "model_path" in the running script:

bash scripts/run_wan_t2v_causal.sh

License Agreement

The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generate contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations. For a complete list of restrictions and details regarding your rights, please refer to the full text of the license.

Acknowledgements

We would like to thank the contributors to the Wan2.1, CausVid and OpenSora repositories, for their open research.