Stability AI introduces Stable Video Diffusion models in research preview

[ad_1]

While OpenAI is the return of Sam Altman, its rivals are on their way to raising the bar in the AI race. Just after that Anthropic’s release of Claude 2.1 and Adobe’s reported acquisition from Rehrase.ai, Stability AI has announced the release of Stable Video Diffusion to mark its entry into the in-demand video generation space.

Available for research purposes only, Stable Video Diffusion (SVD) includes two state-of-the-art AI models – SVD and SVD-XT – that produce short clips of images. The company says they both produce high-quality output, which matches or even exceeds the performance of other AI video generators.

Stability AI has open-sourced the image-to-video models as part of its research sample and plans to leverage user feedback to further refine them, ultimately paving the way for their commercial application.

Understanding stable video diffusion

According to a blog post from the company’s SVD and SVD-XT latent diffusion models that take a still image as a conditioning frame and generate 576 X 1024 video from it. Both models produce content at speeds between three and thirty frames per second, but the output is quite short: only a maximum of four seconds. The SVD model is trained to produce 14 frames from photos, while the latter goes up to 25, Stability AI noted.

To create stable video diffusion, the company used a large, systematically curated video dataset, consisting of approximately 600 million samples, and trained a base model. Then this model was refined on a smaller, high-quality dataset (with up to one million clips) to address downstream tasks such as text-to-video and image-to-video, where a sequence of frames is predicted from a single conditioning image.

Stability AI said the data for training and refining the model came from publicly available research datasets, although the exact source remains unclear.

High quality output, but limitations remain

An external evaluation by human voters found that the SVD output was of high quality and easily outperformed the leading closed text-to-video models from Track And Pika Labs. However, the company notes that this is just the beginning of its work and that the models are far from perfect at this stage. In many cases, they fail to deliver photorealism, generate videos with no motion or with very slow camera movements, and fail to generate faces and people as users would expect.

Ultimately, the company plans to use this research sample to refine both models, close current gaps, and introduce new features, such as support for text prompts or text rendering in videos, for commercial applications. It emphasized that the current release is mainly intended to invite open research into the models, which could identify more problems (such as biases) and help with safe implementation later.

“We plan a variety of models that build on and extend this foundation, similar to the ecosystem built around stable diffusion,” the company wrote. It has also started calling for users to sign up for an upcoming web experience that will allow users to generate videos from text.

That said, it remains unclear when exactly the experience will be available.

A glimpse into Stable Video Diffusion’s text-to-video experience

How to use the models?

To get started with the new open-source Stable Video Diffusion models, users can find the code at the company’s GitHub repository and the weights required to run the model locally on his computer Hugging Face page. The company notes that use will only be permitted upon acceptance of the terms, which outline both permitted and excluded uses.

From now on, the permitted usage scenarios include, in addition to examining and researching the models, generating artworks for design and other artistic processes and applications in educational or creative tools.

Generating factual or “true representations of people or events” is out of scope, according to Stability AI.

[ad_2]

Understanding stable video diffusion

Stability AI said the data for training and refining the model came from publicly available research datasets, although the exact source remains unclear.

All this could eventually culminate in a wide range of applications in sectors such as advertising, education and entertainment, the company added in its blog post.

High quality output, but limitations remain

That said, it remains unclear when exactly the experience will be available.

A glimpse into Stable Video Diffusion’s text-to-video experience

How to use the models?

Generating factual or “true representations of people or events” is out of scope, according to Stability AI.

Stability AI introduces Stable Video Diffusion models in research preview

Understanding stable video diffusion

High quality output, but limitations remain

How to use the models?

Share your thoughts!

LEAVE A REPLY Cancel reply

Search

Most Popular

How To Jailbreak ChatGPT GPT-4: Removing Restrictions

How to Prevent Your Content from Being Scraped by GPT-5

Microsoft 365 Copilot Price & Availability

Character.AI: How to Have Chat Conversations with AI Characters

What To Expect from AI in 2024: Some Huge Predictions!

Latest Articles

Best AI Tools for UI Design: A Comprehensive Guide

OpenAI Sora: AI Model That Create Realistic Videos from Scratch

21 Amazing Free AI Phone Apps You Need to Try

Chrome’s New AI Features: A Game-Changer for Web Browsing

How LARP AI Research will make Video Games More REALISTIC!

Stability AI introduces Stable Video Diffusion models in research preview

Understanding stable video diffusion

High quality output, but limitations remain

How to use the models?

Share your thoughts!

LEAVE A REPLY Cancel reply

Search

Most Popular

How To Jailbreak ChatGPT GPT-4: Removing Restrictions

How to Prevent Your Content from Being Scraped by GPT-5

Microsoft 365 Copilot Price & Availability

Character.AI: How to Have Chat Conversations with AI Characters

What To Expect from AI in 2024: Some Huge Predictions!

Similar Articles

Similar Articles