Pytorch/Python3 implementation of DeepAccNet, protein model accuracy evaluator.
Python-PyTorch implemenation of DeepAccNet described in https://www.biorxiv.org/content/10.1101/2020.07.17.209643v2
This method will estimate how good your protein models are using a metric called l-DDT (local distance difference test).
usage: DeepAccNet.py [-h] [--modelpath MODELPATH] [--pdb] [--csv] [--leaveTempFile] [--process PROCESS] [--featurize]
[--reprocess] [--verbose] [--bert] [--ensemble]
input ...
Error predictor network
positional arguments:
input path to input folder or input pdb file
output path to output (folder path, npz, or csv)
optional arguments:
-h, --help show this help message and exit
--pdb, -pdb Running on a single pdb file instead of a folder (Default: False)
--csv, -csv Writing results to a csv file (Default: False)
--per_res_only, -pr Writing per-residue accuracy only (Default: False)
--leaveTempFile, -lt Leaving temporary files (Default: False)
--process PROCESS, -p PROCESS
Specifying # of cpus to use for featurization (Default: 1)
--featurize, -f Running only the featurization part (Default: False)
--reprocess, -r Reprocessing all feature files (Default: False)
--verbose, -v Activating verbose flag (Default: False)
--bert, -bert Run with bert features. Use extractBert.py to generate them. (Default: False)
--ensemble, -e Running with ensembling of 4 models. This adds 4x computational time with some overheads
(Default: False)
v0.0.1
(For IPD users, please use the tensorflow
conda environment)
Running on a folder of pdbs (foldername: samples
)
python DeepAccNet.py -r -v samples outputs
Running on a silentfile (filename: sample.silent
)
python DeepAccNet-SILENT.py sample.silent output.csv
Output of the network is written to [input_file_name].npz
, unless you had the --csv
flag on.
You can extract the predictions as follows.
import numpy as np
x = np.load("testoutput.npz")
lddt = x["lddt"] # per residue lddt
estogram = x["estogram"] # per pairwise distance e-stogram
mask = x["mask"] # mask predicting native < 15
Perhaps lddt
is the easiest place to start as it is per-residue quality score. You can simply take an average if you want a global score per protein structure.
If you want to do something more involved, check.ipynb is a good place to start.