Official Pytorch Implementation for "Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer""
This is the official implementation of the paper:
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Danah Yatim*,
Rafail Fridman*,
Omer Bar-Tal,
Yoni Kasten,
Tali Dekel
(*equal contribution)
Introducing a zero-shot method for transferring motion across objects and scenes. without any training or finetuning.
We present a new method for text-driven motion transfer -- synthesizing a video that complies with an input text prompt describing the target objects and scene while maintaining an input video's motion and scene layout. Prior methods are confined to transferring motion across two subjects within the same or closely related object categories and are applicable for limited domains (e.g., humans). In this work, we consider a significantly more challenging setting in which the target and source objects differ drastically in shape and fine-grained motion characteristics (e.g., translating a jumping dog into a dolphin). To this end, we leverage a pre-trained and fixed text-to-video diffusion model, which provides us with generative and motion priors. The pillar of our method is a new space-time feature loss derived directly from the model. This loss guides the generation process to preserve the overall motion of the input video while complying with the target object in terms of shape and fine-grained motion traits.
For more, visit the project webpage.
Clone the repo and create a new environment:
git clone https://github.com/diffusion-motion-transfer/diffusion-motion-transfer.git
cd diffusion-motion-transfer
conda create --name dmt python=3.9
conda activate dmt
Install our environment requirements:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
To preprocess a video, update configuration file `configs/preprocess_config.yaml':
Arguments to update:
video_path
- the input video frames should be located in this pathsave_dir
- the latents will be saved in this pathprompt
- empty string or a string describing the video contentOptional arguments to update:
--save_ddim_reconstruction
if True, the reconstructed video will be saved in --save_dir
After updating config file, run the following command:
python preprocess_video_ddim.py --config_path configs/preprocess_config.yaml
Once the preprocessing is done, the latents will be saved in the save_dir
path.
To edit the video, update configuration file configs/guidance_config.yaml
Arguments to update:
data_path
- the input video frames should be located in this pathoutput_path
- the edited video will be saved in this pathlatents_path
- the latents of the input video should be located in this pathsource_prompt
- prompt used for inversiontarget_prompt
- prompt used for editingOptional arguments to update:
negative_prompt
- prompt used for unconditional classifier free guidanceseed
- By default it is randomly chosen, to specify seed change thise value.optimization_step
- number of optimization steps for each denoising stepoptim_lr
- learning ratewith_lr_decay
- if True, overrides optim_lr
, and the learning rate will decay during the optimization process in the range of scale_range
After updating the config file, run the following command:
python run.py --config_path configs/guidance_config.yaml
Once the method is done, the video will be saved to the output_path
under result.mp4
.
"Amazing quality, masterpiece, "
for inversion and edits.optimization_step: 30
.scale_range:[0.005, 0.002]
,We also provide the code for calculating the motion fidelity metric introduced in the paper (Section 5.1). To calculate the motion fidelity metric, first follow the instructions here to install Co-Tracker and download their checkpoint. Then, run the following command:
python motion_fidelity_score.py --config_path configs/motion_fidelity_config.yaml
@article{yatim2023spacetime,
title = {Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer},
author = {Yatim, Danah and Fridman, Rafail and Bar-Tal, Omer and Kasten, Yoni and Dekel, Tali},
journal={arXiv preprint arxiv:2311.17009},
year={2023}
}