School of Information Systems

Introducing Google’s Lumiere

In this new era of technology which is heavily influenced by advanced technology such as Artificial Intelligence, there are many emerging platforms that would shape a better new life ahead. A few examples of emerging platforms are Chat GPT, Google Bard/Gemini, and DALL-E. These examples have shown that AI platforms have spread to various types of fields. Emerging as a new competitor in the AI generative model field, Google recently introduced its new developing platform, Lumiere. Google’s Lumiere will serve as creative partners for content creation. Lumiere provides multiple features which include Text-to-Video, Image-to-Video, Stylized Generation, Video Stylization, Cinemagraphs, and Inpainting.  These features would help Lumiere users to generate new video with ease and effectiveness.  

Google’s Lumiere development uses Diffusion Probabilistic Models which trained to estimate Lumiere videos distribution with a set of denoising steps for its generative approach. The model’s distribution learning was done by the inclusion of supplementary guiding signals. A few examples of the guiding signals are embedded text and spatial conditioning. For its framework, Lumiere uses base model and spatial super-resolution model (SSR). To ensure that the generated videos are more realistic and have coherent motion, Google uses the Space-Time U-Net architecture. Through those model, framework, and architecture, Google’s Lumiere define its features as: 

1. Image-To-Video 

This feature would use model that generates that would begin with user input as the first frame. For example, if user input a picture of flower, then the model would generate the videos starting with that picture.  

2. Stylized Generation 

The model for this feature would use user’s image input as a reference to generate video according to its target styles. Stylized Generation would utilized the use of fine-tuned text-to-image model. 

3. Inpainting 

This feature introduced Lumiere smart identification of masked area in user’s input video. Inpainting feature would help user to replace or input new object in the masked area. Lumiere also would make sure that the generated videos have seamless and natural effect for the masked area completion.  

4. Cinemagraphs 

Cinemagraphs feature will serve to help user animate the content of their image input according to their specific region in the picture. For example, if a user input a picture of butterfly on top of flower and give a mask in the butterfly region, Cinemagraphs will help animate the masked region and the rest of the image would remain static.  

Alongside with all those features, Lumiere could definitely serve as our24/7 personal video editor, however the development for Lumiere is still in process as its have enter research step but no code yet. So, we might need to wait a little bit more to use Lumiere in its prime condition.  

 

Reference: 

https://www.linkedin.com/pulse/google-research-releases-lumiere-new-ai-video-model-david-cronshaw-abs4c/?trk=article-ssr-frontend-pulse_more-articles_related-content-card 

https://lumiere-video.github.io/ 

 

Aristia Utari Putri