Collection of data sets of molecules for a validation of properties inference
From Moleculenet.ai, here are their short description and the task for inference between squared brackets (for the regularized data sets reported here):
QM9: Geometric, energetic, electronic and thermodynamic properties of DFT-modelled small molecules [classification]
ESOL: Water solubility data(log solubility in mols per litre) for common organic small molecules [regression]
FreeSolv: Experimental and calculated hydration free energy of small molecules in water [regression]
Lipophilicity: Experimental results of octanol/water distribution coefficient(logD at pH 7.4) [regression]
PCBA: Selected from PubChem BioAssay, consisting of measured biological activities of small molecules generated by high-throughput screening [classification]
HIV: Experimentally measured abilities to inhibit HIV replication [classification]
BACE: Quantitative (IC50) and qualitative (binary label) binding results for a set of inhibitors of human β-secretase 1(BACE-1) [classification/regression]
BBBP: Binary labels of blood-brain barrier penetration(permeability) [classification]
Tox21: Qualitative toxicity measurements on 12 biological targets, including nuclear receptors and stress response pathways [classification]
ToxCast: Toxicology data for a large library of compounds based on in vitro high-throughput screening, including experiments on over 600 tasks [classification]
SIDER: Database of marketed drugs and adverse drug reactions (ADR), grouped into 27 system organ classes [classification]
ClinTox: Qualitative data of drugs approved by the FDA and those that have failed clinical trials for toxicity reasons [classification]
Source: Moleculenet.ai
Paper: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande, MoleculeNet: A Benchmark for Molecular Machine Learning, arXiv: 1703.00564, 2017 [cs.LG]