(CNN) — The Mona Lisa can do more than just smile thanks to new AI technology from Microsoft.
Last week, Microsoft researchers unveiled a new AI model that can take a still image of a face and an audio clip of a person speaking and automatically create a realistic video of that person speaking. The videos—which can be created from real-life faces, animations, or illustrations—are complete with convincing lip-syncs and natural facial and head movements.
In a demonstration video, the researchers showed how they animated Mona Lisa to recite a comedic rap song by actress Anne Hathaway.
The results of the artificial intelligence model are called Vasa-1As fun as it is a bit shocking in its realism. According to Microsoft, this technology could be used in education, “to improve accessibility for people with connectivity issues,” or even to create virtual companions for humans. But it's also easy to see how the tool could be abused and used to impersonate real people.
It's a concern that goes beyond Microsoft: As more tools emerge to create engaging AI-generated images, videos, and audio, Experts are concerned Their misuse can lead to new forms of misinformation. Some also worry that technology may further disrupt creative industries, from films to advertising.
At this time, Microsoft does not plan to deploy the VASA-1 model immediately. This step is similar to how a Microsoft partner manages OpenAI Concerns surrounding the AI-generated video tool Sora. OpenAI introduced Sora in February, but so far has only made it available to a small number of professional users and cybersecurity gurus for testing purposes.
“We oppose any behavior to create misleading or harmful content from real people,” Microsoft researchers said in a blog post. But they added that the company “has no plans to release” the product publicly “until we are confident that the technology will be used responsibly and in accordance with appropriate regulations.”
The faces are moving
The researchers explained that Microsoft's new AI model was trained on several videos of people's faces speaking, and is designed to recognize natural facial and head movements, including “lip movement, (non-lip) expression, gaze and blinking, among others.” “. . The result is more realistic video when the VASA-1 pans a still image.
For example, in an explainer video that includes a clip in which someone appears agitated, apparently while playing video games, the speaking face has furrowed eyebrows and pursed lips.
The AI tool can also produce a video in which the subject looks in a certain direction or expresses a certain emotion.
If you look closely, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft believes its model is “far superior” to other similar tools and “paves the way for real-time interaction with realistic avatars that mimic human conversational behaviors.”
“Proud web fanatic. Subtly charming twitter geek. Reader. Internet trailblazer. Music buff.”
More Stories
Apple will introduce new models of the Macbook Pro, redesigned Mac mini, and iPad mini at the end of October
When do Walmart Holiday Deals start? The date that everyone has been waiting for
What will be the consequences on this planet?