OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

TL;DR: We propose an end-to-end multimodality-conditioned human video generation framework named OmniHuman, which can generate human videos based on a single human image and motion signals (e.g., audio only, video only, or a combination of audio and video).

OmniHuman Overview

In OmniHuman, we introduce a multimodality motion conditioning mixed training strategy, allowing the model to benefit from data scaling up of mixed conditioning. This overcomes the issue that previous end-to-end approaches faced due to the scarcity of high-quality data. OmniHuman significantly outperforms existing methods, generating extremely realistic human videos based on weak signal inputs, especially audio. It supports image inputs of any aspect ratio, whether they are portraits, half-body, or full-body images, delivering more lifelike and high-quality results across various scenarios.

Currently, we do not offer services/downloads anywhere, nor do we have any SNS accounts for the project. Please be cautious of fraudulent information. We will provide timely updates on future developments.

Generated Videos

OmniHuman supports various visual and audio styles. It can generate realistic human videos at any aspect ratio and body proportion (portrait, half-body, full-body all in one), with realism stemming from comprehensive aspects including motion, lighting, and texture details.

Talking

OmniHuman can support input of any aspect ratio in terms of speech. It significantly improves the handling of gestures, which is a challenge for existing methods, and produces highly realistic results.

Diversity

In terms of input diversity, OmniHuman supports cartoons, artificial objects, animals, and challenging poses, ensuring motion characteristics match each style's unique features.

More Halfbody Cases with Hands

Here, we also provide additional examples specifically showcasing gesture movements. Some input images and audio come from TED, Pexels, and AIGC.

More Portrait Cases

Here, we also include a section dedicated to portrait aspect ratio results, which are derived from test samples in the CelebV-HQ datasets.

FAQ

OmniHuman is an AI-based framework for generating realistic human videos from a single image and motion signals.

Currently, OmniHuman is not available for public download or usage. Be cautious of fraudulent information.

Official updates about OmniHuman will be provided by the research team. There are no official SNS accounts.