Scripts to run and benchmark scRNA-seq cell cluster labeling methods
This repository contains scripts to run and benchmark scRNA-seq cell cluster labeling methods and is a companion to our paper 'Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data' (Diaz-Mejia JJ et al (2019) [https://f1000research.com/articles/8-296].
Script name | Task(s) |
---|---|
subsamples_gene_classes_and_runs_enrichment_scripts.R |
Main wrapper to run and benchmark cell cluster labeling methods |
Script name | Task(s) |
---|---|
obtains_CIBERSORT_for_MatrixColumns.pl |
Runs CIBERSORT using gene expression signatures and a matrix with average gene expressions per gene, per cell cluster |
obtains_GSEA_for_MatrixColumns.pl |
Runs GSEA using gene expression signatures and a matrix with average gene expressions per cell cluster |
obtains_GSVA_for_MatrixColumns.R |
Runs GSVA using gene expression signatures and a matrix with average gene expressions per cell cluster |
obtains_METANEIGHBOR_for_MatrixColumns.R |
Runs MetaNeighborUS using gene expression signatures, a matrix with average gene expressions per cell cluster, and reference cell types. Note: modifications were made to the R library(metaneighbor) source code. Check bin/r_programs/obtains_METANEIGHBOR_for_MatrixColumns.R for details |
obtains_ORA_for_MatrixColumns.pl |
Runs ORA using gene expression signatures and a matrix with average gene expressions per cell cluster |
Script name | Task(s) |
---|---|
obtains_performance_plots_from_cluster_labelings.pl |
Compiles results from cell type labeling methods and obtains ROC and PR curves plots and AUC's |
obtains_ROC_and_PR_curves_from_matrix_with_gold_standards.R |
Obtains ROC and PR curve plots, ROC AUC and PR AUC values from a matrix of reference labels in column 2 and predictions in columns 3 to N |
Script name | Task(s) |
---|---|
obtains_permuted_samples_from_gmt.R |
Subsamples genes from gene expression signatures in the form of gene sets |
propagates_permuted_gmt_files_to_profile.R |
Propagates subsampling from signatures in the form of gene sets to those in the form of gene expression profiles |
Script name | Task(s) |
---|---|
obtains_average_gene_expression_per_cluster.R |
Obtains a matrix with average gene expressions per cell cluster from scRNA-seq data, and cell cluster assignments) |
To see the help
of R scripts run them like:
Rscript ~/path_to_script/script.R -h
To see the help of Perl scripts, make the files executable with
chmod +x script.pl
and run them like:
~/path_to_script/script.pl
Check each script code for dependencies and further documentation.
To install all R packages use:
install.packages(c("optparse", "vioplot", "GSA", "data.table", "precrec", "ROCR", "Seurat", "dplyr", "Rserve", "e1071", "colorRamps", "stats"))
to install packages from CRAN.
And:
install.packages("BiocManager")
BiocManager::install(c("preprocessCore", "GSVA", "qvalue"))
to install packages from Bioconductor
Used R version 3.5.1
To install Perl script dependencies download perl_modules
directory from this repository
and add it to your PERL5LIB
environment variable.
Other Perl modules required are: Date::Calc
which can be installed from CPAN
Used Perl version 5
The following Java scripts are needed:
CIBERSORT.jar
can be obtained from https://cibersort.stanford.edu/download.php
gsea-3.0.jar
can be obtained from http://software.broadinstitute.org/gsea/downloads.jsp
Used Java version 1.8.0_162
Example inputs and outputs can be found here:
https://github.com/jdime/scRNAseq_cell_cluster_labeling/tree/master/examples
Version 1.0 http://doi.org/10.5281/zenodo.2583161
Version 2.0 http://doi.org/10.5281/zenodo.3350461
Please click here to report an 'New Issue' (i.e. report bugs or request features). https://github.com/jdime/scRNAseq_cell_cluster_labeling/issues
Javier Diaz (https://github.com/jdime)