Tech

Google responds to Meta’s video-generating AI with its own Video Imagen, dubbed Video Imagen • TechCrunch


Not to be outdone by Meta’s Make-A-Video, Google today detailed its work on Video Imagen, an AI system that can generate video clips given a text prompt (e.g. “a teddy bear is washing dishes”). While the results aren’t perfect – the repetitive clips the system produces tend to have artifacts and noise – Google claims that Imagen Video is a step towards a system with a “high degree of control” and world knowledge, including the ability to create footage in a wide range of art styles.

As my colleague Devin Coldewey noted in Piece Regarding Make-A-Video, text-to-video systems are not new. Earlier this year, a team of researchers from Tsinghua University and Beijing Academy of Artificial Intelligence released CogVideo, which can translate text into short, high-fidelity and streamlined clips. But Imagen Video appears to be a significant leap forward from the cutting-edge technology of the past, demonstrating a gift for animating subtitles that existing systems have a hard time understanding.

“It’s definitely an improvement,” Matthew Guzdial, an assistant professor at the University of Alberta who studies AI and machine learning, told TechCrunch via email. “As you can see from the video examples, even though the comms team is choosing the best output, there is still blur and weird visuals. So this is definitely not going to be used directly in animation or TV any time soon. But it, or something like that, can certainly be embedded in tools to help speed things up. “

Google Imagen Videos

Image credits: Google

Google Imagen Videos

Image credits: Google

Imagen Video is built on Google Imagenan image generation system comparable to OpenAI’s DALL-E 2 and Stable diffusion. Imagen is known as a “diffusion” model, which creates new data (e.g. video) by learning to “destroy” and “restore” many existing data samples. When it is fed existing samples, the model will better recover the data it was previously destroyed to create new compositions.

Google Imagen Videos

Image credits: Google

As the Google team behind Imagen Video explains in one paper, the system takes a text description and generates a 16-frame, three-frame-per-second video at 24 x 48 pixel resolution. The system then upscales and “predicts” additional frames, producing a final 128-fps, 24 fps video at 720p (1280×768).

Google Imagen Videos

Image credits: Google

Google Imagen Videos

Image credits: Google

Google says Imagen Video has been trained on 14 million video text pairs and 60 million image text pairs as well as the publicly available LAION-400M image text dataset, allowing it to generalize to a variety of aesthetics. During experiments, they discovered that Imagen Video can create videos in the style of Van Gogh paintings and watercolors. Perhaps more impressively, they claim that Imagen Video has demonstrated an understanding of depth and three-dimensional space, allowing it to create drone-like videos that spin around and capture objects from different angles without distorting them.

In a major improvement over existing image generation systems, Imagen Video can also display text properly. While both Diffusion Stabilization and DALL-E 2 struggled to translate prompts like “logo for ‘Diffusion'” into readable type, Imagen Video renders it without a problem – at least. is a review of the article.

That’s not to say Imagen Video is limitless. As is the case with Make-A-Video, even clips culled from Imagen Video are jerky and partially distorted, as Guzdial alludes to, with objects coming together in unnatural ways – and can not -. The researchers also note that the data used to train the system contains problematic content, which could lead to Imagen Video creating sexually explicit or violently graphic clips; Google said it would not release the Video Imagen model or the source code “until these concerns are mitigated”.

However, with the rapid development of text-to-video technology, it may not be long before an open source model emerges – both fostering creativity and creating an intractable challenge. , where it deals with misinformation and insight.

news7h

News7h: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, Sports...at the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button