GeoSSL: Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching, ICLR'23 (https://openreview.net/forum?id=CjTHVo1dvR)
ICLR 2023
Authors: Shengchao Liu, Hongyu Guo, Jian Tang
[Project Page] [OpenReview] [ArXiv]
This repository provides the source code for the ICLR'23 paper Molecular Geometry Pretraining with SE(3)-Invariant Denoising Distance Matching, with the following task:
conda create -n Geom3D python=3.7
conda activate Geom3D
conda install -y -c rdkit rdkit
conda install -y numpy networkx scikit-learn
conda install -y -c conda-forge -c pytorch pytorch=1.9.1
conda install -y -c pyg -c conda-forge pyg=2.0.2
pip install ogb==1.2.1
pip install sympy
pip install ase # for SchNet
pip install atom3d # for Atom3D
pip install cffi # for Atom3D
pip install biopython # for Atom3D
pip intall -e .
examples/generate_Molecule3D.py
. The default path is data/Molecule3D/Molecule3D_1000000
.data/molecule_datasets/qm9
.data/md17
.cd data
mkdir -p lba/raw
mkdir -p lba/processed
cd lba/raw
wget http://www.pdbbind.org.cn/download/PDBbind_v2020_refined.tar.gz
tar -xzvf PDBbind_v2020_refined.tar.gz
wget https://zenodo.org/record/4914718/files/LBA-split-by-sequence-identity-30.tar.gz
tar -xzvf LBA-split-by-sequence-identity-30.tar.gz
mv split-by-sequence-identity-30/indices ../processed/
mv split-by-sequence-identity-30/targets ../processed/
cd data
mkdir -p lep/raw
mkdir -p lep/processed
cd lep/raw
wget https://zenodo.org/record/4914734/files/LEP-raw.tar.gz
tar -xzvf LEP-raw.tar.gz
wget https://zenodo.org/record/4914734/files/LEP-split-by-protein.tar.gz
tar -xzvf LEP-split-by-protein.tar.gz
For pretraining, we provide implementations on eight pretraining baselines and our proposed GeoSSL-DDM under the examples
folder:
pretrain_Supervised.py
.pretrain_ChargePrediction.py
.pretrain_DistancePrediction.py
.pretrain_TorsionAnglePreddiction.py
.pretrain_3DInfoGraph.py
.pretrain_GeoSSL.py
.
--GeoSSL_option=RR
.--GeoSSL_option=InfoNCE
.--GeoSSL_option=EBM-NCE
.--GeoSSL_option=DDM
.The running scripts and corresponding hyper-parameters can be found in scripts/pretrain_baselines
and scripts/pretrain_GeoSSL_DDM
.
The downstream scripts can be found under the examples
folder:
finetune_qm9.py
finetune_md17.py
finetune_lba.py
finetune_lep.py
The running scripts and corresponding hyper-parameters can be found in scripts/finetune
. Note that as a fair comparison, we keep a fixed hyper-parameter set for each downstream task, and the only difference is the pretrained checkpoints.
We provide both the log files and checkpoints for GeoSSL-DDM here. The log files and checkpoints for other baselines will be released in the next version.
Feel free to cite this work if you find it useful to you!
@inproceedings{
liu2023molecular,
title={Molecular Geometry Pretraining with {SE}(3)-Invariant Denoising Distance Matching},
author={Shengchao Liu and Hongyu Guo and Jian Tang},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=CjTHVo1dvR}
}