Haihuangcode CMG Save

The official implementation of Achieving Cross Modal Generalization with Multimodal Unified Representation (NeurIPS '23)

Project README

Achieving Cross Modal Generalization with Multimodal Unified Representation, NeurIPS 2023

model

This is the Pytorch implementation of our paper:

Achieving Cross Modal Generalization with Multimodal Unified Representation

Yan Xia, Hai Huang, Jieming Zhu, Zhou Zhao

In NeurIPS 2023

📝Requirements and Installation

Getting Started

git clone https://github.com/haihuangcode/CMG
cd CMG
# You don't actually have to install all the libraries in the txt file, you can choose to install them as needed.
# It is recommended to use Python 3.7, as some libraries used do not support higher versions of Python.
conda create -n your_env_name python=3.7
pip install -r requirements.txt

Pretrain

cd CMG/code/src
./pretrain.sh

AVE_downstream

cd CMG/code/src
./ave.sh

AVVP_downstream

cd CMG/code/src
./avvp.sh

AVE_AVVP_downstream

cd CMG/code/src
./ave_avvp.sh

UCF_VGGSOUND_downstream

cd CMG/code/src
./ucf_vggsound.sh

AVS_downstream

cd CMG/code/AVSBench_downstream/avs_scripts/avs_s4
./train.sh
./test.sh

🎓Cite

If you find this work useful, please consider citing it.

@article{xia2024achieving,
  title={Achieving Cross Modal Generalization with Multimodal Unified Representation},
  author={Xia, Yan and Huang, Hai and Zhu, Jieming and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={36},
  year={2024}
}

✏Model Checkpoints And Date Feature

Baidu Disk (pwd: 1234)

2023.11.07 Update https://github.com/haihuangcode/CMG/issues/1

✏Directory

CMG
├── checkpoint
├── cnt.pkl
├── code
├── data
├── figs
├── paper
├── README.md
└── requirements.txt

✏Note

For the video and audio feature extraction method, please refer to AVE, text is based on the label to generate a description-focused statement of approximately 10 words in length.
There is no validation set for the pre-training process, in this paper it is done by testing the performance of each model on the downstream of the AVE, and the model with the best performance tests the rest of the downstream tasks, so the AVE can be regarded as a validation set and the model with the best pre-training appears in the first 5 epochs.
Pretraining can be performed using just one GPU, such as 4090 or A100. The experimental results in the paper were obtained by running on 4090 or A100. Multi-GPU parallel training yielded poorer model performance, possibly due to issues between the mutual information minimization design in DCID and Pytorch (but this was an early experimental observation, and was not re-verified after the code was finalized, since single GPU pretraining was sufficient).

👍Acknowledgments

Our code is based on AVE, AVVP, PSP, CPSP, VGGSOUND, AVS.

Open Source Agenda is not affiliated with "Haihuangcode CMG" Project. README Source: haihuangcode/CMG

Stars

Open Issues

Last Commit

2 months ago

Repository

haihuangcode/CMG

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/haihuangcode-cmg"><img src="https://www.opensourceagenda.com/projects/haihuangcode-cmg/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog