VirtualConductor Save

🎶 Music-Driven Conducting Motion Generation (IEEE ICME'21 Best Demo)

Project README

Virtual Conductor

The first step towards deep learning based music driven conducting motion generation.

model pipline

This repository is the official implementation of Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation, by Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, and Feng Xu. This repository also provide the access to the ConductorMotion100 dataset, which consists of 100 hours of orchestral conductor motions and aligned music Mel spectrogram.

The above figure gives a high-level illustration of the proposed two-stage approach. The contrastive learning and generative learning stage are bridged by transferring learned music and motion encoders, as noted in dotted lines. Our approach can generate plausible, diverse, and music-synchronized conducting motion.

Updates🔔

Getting Started

Install

  • Clone this repo:

    git clone https://github.com/ChenDelong1999/VirtualConductor.git
    cd VirtualConductor
    
  • Create a conda virtual environment and activate it:

    conda create -n VirtualConductor python=3.6 -y
    conda activate VirtualConductor
    
  • Install CUDA Toolkit 11.3 (link) and cudnn==8.2.1 (link), then install PyTorch==1.10.1:

    conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -y
    # if you prefer other cuda versions, please choose suitable pytorch versions
    # see: https://pytorch.org/get-started/locally/
    
  • Install other requirements:

    conda install ffmpeg -c conda-forge -y
    pip install librosa matplotlib scipy tqdm moviepy opencv-python tensorboard
    

Test on Your Own Music 🎶

  • Copy your music file to /test/test_samples/ folder. We have prepared some for you.
  • You need the pretrained weights of a M2S-GAN to generate motions. We have prepared a pretrained checkpoint, which is placed at checkpoints/M2SGAN/M2SGAN_official_pretrained.pt.
  • Now, by run the following comment, the test_unseen.py will do the following:
    1. enumerate all samples in /test/test_samples/ folder,

    2. extract Mel spectrogram from music,

    3. generate conducting motions, and

    4. save result videos to /test/result/

      python test_unseen.py --model 'checkpoints/M2SGAN/M2SGAN_official_pretrained.pt'
      

Data Preparation (ConductorMotion100)

The ConductorMotion100 dataset can be downloaded in the following ways:

  • The training set:https://pan.baidu.com/s/1Pmtr7V7-9ChJqQp04NOyZg?pwd=3209
  • The validation set:https://pan.baidu.com/s/1B5JrZnFCFvI9ABkuJeWoFQ?pwd=3209
  • The test set:https://pan.baidu.com/s/18ecHYk9b4YM5YTcBNn37qQ?pwd=3209

You can also access the dataset via Google Drive

There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar files. After extract them to <Your Dataset Dir> folder, the file structure will be:

tree <Your Dataset Dir>
<Your Dataset Dir>
    ├───train
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───5268
    │           mel.npy
    │           motion.npy
    ├───val
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───290
    │           mel.npy
    │           motion.npy
    └───test
        ├───0
        │       mel.npy
        │       motion.npy
       ...
        └───293
                mel.npy
                motion.npy

Each mel.npy and motion.npy are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128). The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)

We provide codes to load and visualize the dataset, as in utils/dataset.py. You can run this file by:

python utils/dataset.py --dataset_dir <Your Dataset Dir>

Then the script will enumerate all the samples in the dataset. You will get:

matshow

motion_plot

Training

During training, use tensorboard --logdir runs to set up tensorboard logging. Model checkpoints will be saved to /checkpoints/ folder.

  • Step 1

    • Start contrastive learning stage, train the M2S-Net:

      python M2SNet_train.py --dataset_dir <Your Dataset Dir>
      

      It takes ~36 hours with a Titan Xp GPU. With tensorboard (tensorboard --logdir runs), you can visualize the training procedure:

      M2SNet-tensorboard

      We also provide the visualization of the features extracted by M2S-Net M2SNet-features

  • Step 2 (optional)

    • Train a M2S-Net on test set to calculate the 'sync error' (see our paper for more details):

      python M2SNet_train.py --dataset_dir <Your Dataset Dir> --mode hard_test
      

      The training takes ~2.5 hours. img.png

  • Step 3

    • Start generative learning stage, train the M2S-GAN:

      python M2SGAN_train.py --dataset_dir <Your Dataset Dir>
      

      The training takes ~28 hours with a Titan Xp GPU. img.png

Prospective Cup (首届国际“远见杯”元智能数据挑战大赛)

For more details of the "Prospective Cup" competition, please see here.

License

Copyright (c) 2022 Delong Chen. Contact me for commercial use (or rather any use that is not academic research) (email: [email protected]). Free for research use, as long as proper attribution is given and this copyright notice is retained.

Papers

  1. Delong Chen, Fan Liu*, Zewen Li, Feng Xu. VirtualConductor: Music-driven Conducting Video Generation System. IEEE International Conference on Multimedia and Expo (ICME) 2021, Demo Track (Best Demo).

    @article{chen2021virtualconductor,
      author    = {Delong Chen and
                   Fan Liu and
                   Zewen Li and
                   Feng Xu},
      title     = {VirtualConductor: Music-driven Conducting Video Generation System},
      journal   = {CoRR},
      volume    = {abs/2108.04350},
      year      = {2021},
      url       = {https://arxiv.org/abs/2108.04350},
      eprinttype = {arXiv},
      eprint    = {2108.04350}
    }
    
  2. Fan Liu, Delong Chen*, Ruizhi Zhou, Sai Yang, and Feng Xu. Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation. Journal of Computer Science and Technology.

     @article{liu2022self,
       author    = {Fan Liu and
                    Delong Chen and
                    Ruizhi Zhou and
                    Sai Yang and
                    Feng Xu},
       title     = {Self-Supervised Music Motion Synchronization Learning for Music-Driven
                    Conducting Motion Generation},
       journal   = {Journal of Computer Science and Technology},
       volume    = {37},
       number    = {3},
       pages     = {539--558},
       year      = {2022},
       url       = {https://doi.org/10.1007/s11390-022-2030-z},
       doi       = {10.1007/s11390-022-2030-z}
     }
    
Open Source Agenda is not affiliated with "VirtualConductor" Project. README Source: ChenDelong1999/VirtualConductor
Stars
86
Open Issues
0
Last Commit
1 year ago

Open Source Agenda Badge

Open Source Agenda Rating