Method and Device for Generating Synthetic Video Data from a Text Prompt

Abstract
A method for generating synthetic video data form a text prompt, particularly for providing video data for training and/or testing and/or verifying and/or validating a machine learning model. The method includes: providing an input text prompt descriptive for the content of the video data to be generated; decomposing the provided text prompt into at least two text sub-prompts by a large language model; generating a text embedding for each of the at least two text sub-prompts; and generating synthetic video data by a Video Diffusion Model based on the generated text embeddings.

Yumeng Li
Yumeng Li
Applied Scientist

🤗 Dedicated to making Generative Models greater for real-life applications.