Souporcell Versions Save

Clustering scRNAseq by genotypes

v2.5

11 months ago

New singularity build that supports hisat2 and updates multiple software used such as minimap2 and pystan. Singularity hub is now defunct, so I have to host this on my google docs.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/file/d/1_KIevXI1MvkoXtuiMFv8amlWlC0hTP7p' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1_KIevXI1MvkoXtuiMFv8amlWlC0hTP7p" -O souporcell2.5.sif && rm -rf /tmp/cookies.txt

2.0

4 years ago
singularity pull shub://wheaton5/souporcell

Clustering method completely reimplemented in Rust. This along with the algorithm changes gives me several advantages to speed and memory usage as well as ragged array support which allows me to use all data without going way high in memory --max_loci option no longer used (tho have not deleted as an option, TODO) New clustering method using expectation maximization with deterministic annealing greatly improves the ability to overcome local optima with high number of donors. This method also allows us to use the binomial pdf loss function instead of the sum of squared differences loss function. So now all values are log likelihoods and not log loss. Doublet detection improved by using the same statistical urn problem setup, but then iteratively removing detected doublet cell's alleles from the singlet cluster urns and then looking for doublets again until we no longer find new doublets. --known_genotypes now available in 2.0 shared_samples.py script now available to show which clusters correspond to which other clusters in experimental designs involving multiple experiments with overlapping samples various bug fixes

v1.0

4 years ago

This release includes new features such as

Dynamic restarting of pipelines that have been partially completed. --common_variants option to allow input of common variants or known variant loci in the form of a vcf such as one provided through 1kgenomes in our README --skip_remap option to skip fastq generation, remapping, and retagging (not recommended without --common_variants but otherwise use at your own risk)

Singularity build here

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=15znQ43q-_R3k04DmbGs3pka2FXvao4TS' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=15znQ43q-_R3k04DmbGs3pka2FXvao4TS" -O souporcell.sif && rm -rf /tmp/cookies.txt

common variant files here GRCh38

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=15s8zvIit2UO-2lnL2DnsL0YFoR3AWWRF' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=15s8zvIit2UO-2lnL2DnsL0YFoR3AWWRF" -O filtered_2p_1kgenomes_GRCh38.vcf && rm -rf /tmp/cookies.txt

and for hg19

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1ICfIhpA4iGPEz_lAZf6RLMFQlrfgaskL' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1ICfIhpA4iGPEz_lAZf6RLMFQlrfgaskL" -O filtered_2p_1kgenomes_hg19.vcf && rm -rf /tmp/cookies.txt