East Africa News Post

Complete News World

Microsoft's new AI makes Mona Lisa rap.  How it works?

Microsoft's new AI makes Mona Lisa rap. How it works?

(CNN) — The Mona Lisa can do more than just smile thanks to new AI technology from Microsoft.

Last week, Microsoft researchers unveiled a new AI model that can take a still image of a face and an audio clip of a person speaking and automatically create a realistic video of that person speaking. The videos—which can be created from real-life faces, animations, or illustrations—are complete with convincing lip-syncs and natural facial and head movements.

In a demonstration video, the researchers showed how they animated Mona Lisa to recite a comedic rap song by actress Anne Hathaway.

The results of the artificial intelligence model are called Vasa-1As fun as it is a bit shocking in its realism. According to Microsoft, this technology could be used in education, “to improve accessibility for people with connectivity issues,” or even to create virtual companions for humans. But it's also easy to see how the tool could be abused and used to impersonate real people.

It's a concern that goes beyond Microsoft: As more tools emerge to create engaging AI-generated images, videos, and audio, Experts are concerned Their misuse can lead to new forms of misinformation. Some also worry that technology may further disrupt creative industries, from films to advertising.

At this time, Microsoft does not plan to deploy the VASA-1 model immediately. This step is similar to how a Microsoft partner manages OpenAI Concerns surrounding the AI-generated video tool Sora. OpenAI introduced Sora in February, but so far has only made it available to a small number of professional users and cybersecurity gurus for testing purposes.

See also  Stunning images of the brightest gamma-ray burst ever recorded

“We oppose any behavior to create misleading or harmful content from real people,” Microsoft researchers said in a blog post. But they added that the company “has no plans to release” the product publicly “until we are confident that the technology will be used responsibly and in accordance with appropriate regulations.”

The faces are moving

The researchers explained that Microsoft's new AI model was trained on several videos of people's faces speaking, and is designed to recognize natural facial and head movements, including “lip movement, (non-lip) expression, gaze and blinking, among others.” “. . The result is more realistic video when the VASA-1 pans a still image.

For example, in an explainer video that includes a clip in which someone appears agitated, apparently while playing video games, the speaking face has furrowed eyebrows and pursed lips.

The AI ​​tool can also produce a video in which the subject looks in a certain direction or expresses a certain emotion.

If you look closely, there are still signs that the videos are machine-generated, such as infrequent blinking and exaggerated eyebrow movements. But Microsoft believes its model is “far superior” to other similar tools and “paves the way for real-time interaction with realistic avatars that mimic human conversational behaviors.”