Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance
We present a revolutionary AI framework that animates still images with lifelike motion, producing highly expressive and realistic human videos ranging from portrait to full-body animations. The resultant video is temporally consistent, identity-preserved, and of high fidelity.
DreamActor-M1 is a diffusion transformer (DiT) based framework that overcomes limitations in current human animation methods. Using hybrid control signals, it achieves fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence. Our approach integrates implicit facial representations with 3D head spheres and body skeletons, delivering robust results for portraits, upper-body, and full-body generation with exceptional long-term consistency.
DreamActor-M1 delivers state-of-the-art performance through innovative technical approaches
Integrates implicit facial representations with 3D head spheres and body skeletons for comprehensive control over both facial expressions and body movements.
Progressive training with varying data resolutions enables seamless handling of different scales, from close-up portraits to full-body animations.
Advanced appearance guidance combines sequential frame patterns with complementary visual references for exceptional consistency during complex movements.
While recent image-based human animation methods achieve realistic body and facial motion synthesis, critical gaps remain in fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence, which leads to their lower expressiveness and robustness. We propose a diffusion transformer (DiT) based framework, DreamActor, with hybrid guidance to overcome these limitations.
For motion guidance, our hybrid control signals integrate implicit facial representations, 3D head spheres, and 3D body skeletons to achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations. For scale adaptation, we employ a progressive training strategy using data with varying resolutions and scales to handle various body poses and image scales ranging from portraits to full-body views. For appearance guidance, we integrate motion patterns from sequential frames with complementary visual references, ensuring long-term temporal coherence for unseen regions during complex movements.
Experiments demonstrate that our method outperforms state-of-the-art approaches, delivering expressive results for portraits, upper-body, and full-body generation with robust long-term consistency.
Integrates implicit facial representations, 3D head spheres, and 3D body skeletons for holistic control of facial expressions and body movements.
Progressive training strategy using data with varying resolutions and scales to handle different body poses and image views.
Combines motion patterns from sequential frames with complementary visual references for long-term temporal coherence during complex movements.
Our framework is robust to various character styles and motion types
DreamActor-M1 provides unprecedented fine-grained control over motion transfer while maintaining identity and temporal consistency
Our DiT-based framework enables selective motion transfer, allowing independent control of facial expressions, head movements, and body posture. This granular control preserves the source identity while precisely transferring desired motion elements.
Unlike previous methods that struggle with partial animation, our hybrid motion guidance system can isolate and transfer specific movement components with minimal artifacts.
DreamActor-M1 provides explicit control over head pose directions through our 3D head sphere representation. This enables generating animations with precise viewing angle adjustments while maintaining facial expressions and identity.
The explicit disentanglement of pose from appearance allows for consistent identity preservation even during extreme head rotations, outperforming previous approaches that struggle with maintaining identity during significant pose changes.
DreamActor-M1 outperforms existing solutions with superior fine-grained motion control, identity preservation, temporal consistency and visual fidelity across various scales
@misc{luo2025dreamactor-m1holisticexpressiverobust, title={DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance}, author={Yuxuan Luo and Zhengkun Rong and Lizhen Wang and Longhao Zhang and Tianshu Hu and Yongming Zhu}, year={2025}, eprint={2504.01724}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.01724}, }