VirtualConductor Save

🎶 Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)

Project README

Virtual Conductor

The first step towards deep learning based music driven conducting motion generation.

model pipline

This repository is the official implementation of “Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation”, by Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, and Feng Xu. This repository also provide the access to the ConductorMotion100 dataset, which consists of 100 hours of orchestral conductor motions and aligned music Mel spectrogram.

The above figure gives a high-level illustration of the proposed two-stage approach. The contrastive learning and generative learning stage are bridged by transferring learned music and motion encoders, as noted in dotted lines. Our approach can generate plausible, diverse, and music-synchronized conducting motion.

Updates🔔

Mar 2021. Demo Video (preliminary version) released at bilibili.
Apr 2021. ICME 2021 Demo Video released at bilibili.
Apr 2021. Demo Video (with Dynamic Frequency Domain Decomposition) released.
Jun 2021. The recording of graduation thesis defense released. The graduation thesis is awarded as Outstanding Graduation Thesis of Hohai University (河海大学优秀毕业论文) and First-class Outstanding Graduation Thesis of Jiangsu Province (江苏省优秀毕业论文一等奖)!
Jul 2021. The VirtualConductor project is awarded as Best Demo of IEEE International Conference on Multimedia and Expo (ICME) 2021!
Mar 2022. ConductorMotion100 is made publicly available, as a track in the “Prospective Cup” competition (远见杯) hold by JSCS (江苏省计算机学会). Please see here for details.
May 2022. Our paper is published at Journal of Computer Science and Technology (JCST). Check our paper!
Nov 2022. Code for JCST paper is released.

Getting Started

Install

Clone this repo:

git clone https://github.com/ChenDelong1999/VirtualConductor.git
cd VirtualConductor

Create a conda virtual environment and activate it:

conda create -n VirtualConductor python=3.6 -y
conda activate VirtualConductor

Install CUDA Toolkit 11.3 (link) and cudnn==8.2.1 (link), then install PyTorch==1.10.1:

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
# if you prefer other cuda versions, please choose suitable pytorch versions
# see: https://pytorch.org/get-started/locally/

Install other requirements:

conda install ffmpeg -c conda-forge -y
pip install librosa matplotlib scipy tqdm moviepy opencv-python tensorboard

Test on Your Own Music 🎶

Copy your music file to /test/test_samples/ folder. We have prepared some for you.
You need the pretrained weights of a M²S-GAN to generate motions. We have prepared a pretrained checkpoint, which is placed at checkpoints/M2SGAN/M2SGAN_official_pretrained.pt.
Now, by run the following comment, the test_unseen.py will do the following:
1. enumerate all samples in /test/test_samples/ folder,
2. extract Mel spectrogram from music,
3. generate conducting motions, and
4. save result videos to /test/result/
```
python test_unseen.py --model 'checkpoints/M2SGAN/M2SGAN_official_pretrained.pt'
```

Data Preparation (ConductorMotion100)

The ConductorMotion100 dataset can be downloaded in the following ways:

The training set：https://pan.baidu.com/s/1Pmtr7V7-9ChJqQp04NOyZg?pwd=3209
The validation set：https://pan.baidu.com/s/1B5JrZnFCFvI9ABkuJeWoFQ?pwd=3209
The test set：https://pan.baidu.com/s/18ecHYk9b4YM5YTcBNn37qQ?pwd=3209

You can also access the dataset via Google Drive

There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar files. After extract them to <Your Dataset Dir> folder, the file structure will be:

tree <Your Dataset Dir>
<Your Dataset Dir>
    ├───train
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───5268
    │           mel.npy
    │           motion.npy
    ├───val
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───290
    │           mel.npy
    │           motion.npy
    └───test
        ├───0
        │       mel.npy
        │       motion.npy
       ...
        └───293
                mel.npy
                motion.npy

Each mel.npy and motion.npy are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128). The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)

We provide codes to load and visualize the dataset, as in utils/dataset.py. You can run this file by:

python utils/dataset.py --dataset_dir <Your Dataset Dir>

Then the script will enumerate all the samples in the dataset. You will get:

matshow

motion_plot

Training

During training, use tensorboard --logdir runs to set up tensorboard logging. Model checkpoints will be saved to /checkpoints/ folder.

Step 1
- Start contrastive learning stage, train the M²S-Net:
```
python M2SNet_train.py --dataset_dir <Your Dataset Dir>
```
  It takes ~36 hours with a Titan Xp GPU. With tensorboard (tensorboard --logdir runs), you can visualize the training procedure:
  
  We also provide the visualization of the features extracted by M²S-Net
Step 2 (optional)
- Train a M2S-Net on test set to calculate the 'sync error' (see our paper for more details):
```
python M2SNet_train.py --dataset_dir <Your Dataset Dir> --mode hard_test
```
  The training takes ~2.5 hours.
Step 3
- Start generative learning stage, train the M²S-GAN:
```
python M2SGAN_train.py --dataset_dir <Your Dataset Dir>
```
  The training takes ~28 hours with a Titan Xp GPU.

Prospective Cup (首届国际“远见杯”元智能数据挑战大赛)

For more details of the "Prospective Cup" competition, please see here.

License

Copyright (c) 2022 Delong Chen. Contact me for commercial use (or rather any use that is not academic research) (email: [email protected]). Free for research use, as long as proper attribution is given and this copyright notice is retained.

Papers

Delong Chen, Fan Liu*, Zewen Li, Feng Xu. VirtualConductor: Music-driven Conducting Video Generation System. IEEE International Conference on Multimedia and Expo (ICME) 2021, Demo Track (Best Demo).

@article{chen2021virtualconductor,
  author    = {Delong Chen and
               Fan Liu and
               Zewen Li and
               Feng Xu},
  title     = {VirtualConductor: Music-driven Conducting Video Generation System},
  journal   = {CoRR},
  volume    = {abs/2108.04350},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.04350},
  eprinttype = {arXiv},
  eprint    = {2108.04350}
}

Fan Liu, Delong Chen*, Ruizhi Zhou, Sai Yang, and Feng Xu. Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation. Journal of Computer Science and Technology.

 @article{liu2022self,
   author    = {Fan Liu and
                Delong Chen and
                Ruizhi Zhou and
                Sai Yang and
                Feng Xu},
   title     = {Self-Supervised Music Motion Synchronization Learning for Music-Driven
                Conducting Motion Generation},
   journal   = {Journal of Computer Science and Technology},
   volume    = {37},
   number    = {3},
   pages     = {539--558},
   year      = {2022},
   url       = {https://doi.org/10.1007/s11390-022-2030-z},
   doi       = {10.1007/s11390-022-2030-z}
 }

Open Source Agenda is not affiliated with "VirtualConductor" Project. README Source: ChenDelong1999/VirtualConductor

Stars

Open Issues

Last Commit

1 year ago

Repository

ChenDelong1999/VirtualConductor

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/virtualconductor"><img src="https://www.opensourceagenda.com/projects/virtualconductor/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022