🎶 Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)
This repository is the official implementation of “Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation”, by Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, and Feng Xu. This repository also provide the access to the ConductorMotion100 dataset, which consists of 100 hours of orchestral conductor motions and aligned music Mel spectrogram.
The above figure gives a high-level illustration of the proposed two-stage approach. The contrastive learning and generative learning stage are bridged by transferring learned music and motion encoders, as noted in dotted lines. Our approach can generate plausible, diverse, and music-synchronized conducting motion.
Updates🔔
Clone this repo:
git clone https://github.com/ChenDelong1999/VirtualConductor.git
cd VirtualConductor
Create a conda virtual environment and activate it:
conda create -n VirtualConductor python=3.6 -y
conda activate VirtualConductor
Install CUDA Toolkit 11.3
(link) and cudnn==8.2.1
(link), then install PyTorch==1.10.1
:
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
# if you prefer other cuda versions, please choose suitable pytorch versions
# see: https://pytorch.org/get-started/locally/
Install other requirements:
conda install ffmpeg -c conda-forge -y
pip install librosa matplotlib scipy tqdm moviepy opencv-python tensorboard
/test/test_samples/
folder. We have prepared some for you.checkpoints/M2SGAN/M2SGAN_official_pretrained.pt
.test_unseen.py
will do the following:
enumerate all samples in /test/test_samples/
folder,
extract Mel spectrogram from music,
generate conducting motions, and
save result videos to /test/result/
python test_unseen.py --model 'checkpoints/M2SGAN/M2SGAN_official_pretrained.pt'
The ConductorMotion100 dataset can be downloaded in the following ways:
You can also access the dataset via Google Drive
There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar
files. After extract them to <Your Dataset Dir>
folder, the file structure will be:
tree <Your Dataset Dir>
<Your Dataset Dir>
├───train
│ ├───0
│ │ mel.npy
│ │ motion.npy
| ...
│ └───5268
│ mel.npy
│ motion.npy
├───val
│ ├───0
│ │ mel.npy
│ │ motion.npy
| ...
│ └───290
│ mel.npy
│ motion.npy
└───test
├───0
│ mel.npy
│ motion.npy
...
└───293
mel.npy
motion.npy
Each mel.npy
and motion.npy
are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128)
. The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)
We provide codes to load and visualize the dataset, as in utils/dataset.py
. You can run this file by:
python utils/dataset.py --dataset_dir <Your Dataset Dir>
Then the script will enumerate all the samples in the dataset. You will get:
During training, use tensorboard --logdir runs
to set up tensorboard logging. Model checkpoints will be saved to /checkpoints/
folder.
Step 1
Start contrastive learning stage, train the M2S-Net:
python M2SNet_train.py --dataset_dir <Your Dataset Dir>
It takes ~36 hours with a Titan Xp GPU. With tensorboard (tensorboard --logdir runs
), you can visualize the training procedure:
We also provide the visualization of the features extracted by M2S-Net
Step 2 (optional)
Train a M2S-Net on test set to calculate the 'sync error' (see our paper for more details):
python M2SNet_train.py --dataset_dir <Your Dataset Dir> --mode hard_test
The training takes ~2.5 hours.
Step 3
Start generative learning stage, train the M2S-GAN:
python M2SGAN_train.py --dataset_dir <Your Dataset Dir>
The training takes ~28 hours with a Titan Xp GPU.
For more details of the "Prospective Cup" competition, please see here.
Copyright (c) 2022 Delong Chen. Contact me for commercial use (or rather any use that is not academic research) (email: [email protected]). Free for research use, as long as proper attribution is given and this copyright notice is retained.
Delong Chen, Fan Liu*, Zewen Li, Feng Xu. VirtualConductor: Music-driven Conducting Video Generation System. IEEE International Conference on Multimedia and Expo (ICME) 2021, Demo Track (Best Demo).
@article{chen2021virtualconductor,
author = {Delong Chen and
Fan Liu and
Zewen Li and
Feng Xu},
title = {VirtualConductor: Music-driven Conducting Video Generation System},
journal = {CoRR},
volume = {abs/2108.04350},
year = {2021},
url = {https://arxiv.org/abs/2108.04350},
eprinttype = {arXiv},
eprint = {2108.04350}
}
Fan Liu, Delong Chen*, Ruizhi Zhou, Sai Yang, and Feng Xu. Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation. Journal of Computer Science and Technology.
@article{liu2022self,
author = {Fan Liu and
Delong Chen and
Ruizhi Zhou and
Sai Yang and
Feng Xu},
title = {Self-Supervised Music Motion Synchronization Learning for Music-Driven
Conducting Motion Generation},
journal = {Journal of Computer Science and Technology},
volume = {37},
number = {3},
pages = {539--558},
year = {2022},
url = {https://doi.org/10.1007/s11390-022-2030-z},
doi = {10.1007/s11390-022-2030-z}
}