DreamActor-M1

Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance

We present a revolutionary AI framework that animates still images with lifelike motion, producing highly expressive and realistic human videos ranging from portrait to full-body animations. The resultant video is temporally consistent, identity-preserved, and of high fidelity.

DreamActor-M1 is a diffusion transformer (DiT) based framework that overcomes limitations in current human animation methods. Using hybrid control signals, it achieves fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence. Our approach integrates implicit facial representations with 3D head spheres and body skeletons, delivering robust results for portraits, upper-body, and full-body generation with exceptional long-term consistency.

See DreamActor in Action

Key Technical Advantages

DreamActor-M1 delivers state-of-the-art performance through innovative technical approaches

Holistic Control

Integrates implicit facial representations with 3D head spheres and body skeletons for comprehensive control over both facial expressions and body movements.

Multi-Scale Adaptation

Progressive training with varying data resolutions enables seamless handling of different scales, from close-up portraits to full-body animations.

Temporal Coherence

Advanced appearance guidance combines sequential frame patterns with complementary visual references for exceptional consistency during complex movements.

Abstract

While recent image-based human animation methods achieve realistic body and facial motion synthesis, critical gaps remain in fine-grained holistic controllability, multi-scale adaptability, and long-term temporal coherence, which leads to their lower expressiveness and robustness. We propose a diffusion transformer (DiT) based framework, DreamActor, with hybrid guidance to overcome these limitations.

For motion guidance, our hybrid control signals integrate implicit facial representations, 3D head spheres, and 3D body skeletons to achieve robust control of facial expressions and body movements, while producing expressive and identity-preserving animations. For scale adaptation, we employ a progressive training strategy using data with varying resolutions and scales to handle various body poses and image scales ranging from portraits to full-body views. For appearance guidance, we integrate motion patterns from sequential frames with complementary visual references, ensuring long-term temporal coherence for unseen regions during complex movements.

Experiments demonstrate that our method outperforms state-of-the-art approaches, delivering expressive results for portraits, upper-body, and full-body generation with robust long-term consistency.

Method Overview

How It Works

1

Hybrid Motion Guidance

Integrates implicit facial representations, 3D head spheres, and 3D body skeletons for holistic control of facial expressions and body movements.

2

Scale Adaptation

Progressive training strategy using data with varying resolutions and scales to handle different body poses and image views.

3

Appearance Guidance

Combines motion patterns from sequential frames with complementary visual references for long-term temporal coherence during complex movements.

Diversity

Our framework is robust to various character styles and motion types

Controllability and Robustness

DreamActor-M1 provides unprecedented fine-grained control over motion transfer while maintaining identity and temporal consistency

Partial Motion Transfer

Our DiT-based framework enables selective motion transfer, allowing independent control of facial expressions, head movements, and body posture. This granular control preserves the source identity while precisely transferring desired motion elements.

Unlike previous methods that struggle with partial animation, our hybrid motion guidance system can isolate and transfer specific movement components with minimal artifacts.

Head Pose Direction Control

DreamActor-M1 provides explicit control over head pose directions through our 3D head sphere representation. This enables generating animations with precise viewing angle adjustments while maintaining facial expressions and identity.

The explicit disentanglement of pose from appearance allows for consistent identity preservation even during extreme head rotations, outperforming previous approaches that struggle with maintaining identity during significant pose changes.

Comparing to SOTA Methods

DreamActor-M1 outperforms existing solutions with superior fine-grained motion control, identity preservation, temporal consistency and visual fidelity across various scales

Pose Transfer

Portrait Animation

Citation

@misc{luo2025dreamactor-m1holisticexpressiverobust,
  title={DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance}, 
  author={Yuxuan Luo and Zhengkun Rong and Lizhen Wang and Longhao Zhang and Tianshu Hu and Yongming Zhu},
  year={2025},
  eprint={2504.01724},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2504.01724}, 
}