Plass Versions Save

sensitive and precise assembly of short sequencing reads

5-cf8933

1 month ago

Plass & Penguin Release Notes

First release of Penguin, a metagenomic assembler that assembles DNA/RNA through a novel greedy AA/DNA-hybrid bayesian overlap extension strategy.

New Features and Enhancements

  • First release of Penguin: We generate now two binaries, plass and penguin. Plass assembles protein sequences from DNA while Penguin assembles DNA contigs. Penguin comes in two variants penguin guided_nuclassemble, which first assembles using AA six-framed-translated overlaps and then further assemble the contigs using nucleotide information and a pure nucleotide assembler penguin nuclassemble.
  • Compatibility and Portability: Thanks to simde Plass and Penguin now run on ARM (including Apple Silicon) and POWERPC.

4-687d7

3 years ago

Changes since Release 3-764a3:

At a glance: Significant further development of the nucleotide/hybrid assembler. Updated MMseqs2 submodule and adjusted Plass to multiple MMseqs2 changes.

Features

  • Plass can extend one contig multiple times within one iteration
  • Hybrid assembly is progressing nicely, stay tuned for updated!
  • Plass works on many more architectures (e.g. PPC64LE, ARM64 and x64 with SSE2 only)

3-764a3

4 years ago

Changes since Release 2-c7e35:

At a glance: Significant further development of the nucleotide assembler. Reduced hard disk requirements for protein assembler and many bug fixes.

Updated mmseqs submodule and adjusted plass to multiple MMseqs2 changes.

Breaking Changes

  • added reverse complement treatment for nucleotide sequences (plass nuclassemble)
  • introduced --kmer-per-seq-scale parameter to make sure not to miss good hits of long sequences. The number of extracted kmers can now be scaled with a user defined factor multiplied by the length of the sequence.
  • changed scoring mode for alignment calculation (--rescore-mode 3)

Features

  • add stdin support. cat reads.fas | plass assemble stdin asm tmp
  • reduced hard disk requirements by roughly a factor of 12 (--delete-tmp-inc)
  • added a first raw version of a cycle detector (still experimental) to avoid over extension for nucleotide assembly
  • introduced a new header format, which is now consistent for protein and nucleotide assembler <uniq ID> len:<len> cycle:<0|1> The cycle field is optional (for the nucleotide case)
  • introduced a new logic to handle sequences with N repeated k-mers: sequences with more than N repeated k-mers are no longer ignored in the assembly process completely, but instead repeated k-mers are only ignored in the kmermatcher phase. Replaced --skip-n-repeat parameter by --ignore-multi-kmer
  • overlaps are still sorted by ScorePerColumn but the bit score was replaced by the raw score to scale correctly with the overlap length
  • introduced --min-contig-len parameter to set minimum length of assembled contig to output (for nucleotide assembly)
  • added redundancy reduction (for nucleotide assembly) by clustering sequences based on user defined threshold (--clust-thr, default 0.97)
  • Dockerfile now uses Debian slim instead of alpine

Bugs

  • fixed problems in the first iteration of the protein assembler
  • fixed problems with start and stop codons occurring in the transition from protein alignments to nucleotide alignments and alignment offset calculation
  • split file existence check in workflows to individual checks to avoid repeated linking problems
  • fixed bug in the reverse complement calculation for N's in nucleotide sequences
  • fixed different problems for long sequences regarding the kmermatching phase
  • fixed broken compilation without zlib

2-c7e35

5 years ago

Changes since release 1-2e0ef

  • overlaps are now sorted by score per column instead of sequence identity
  • new flag to change neural network threshold of filtering proteins --protein-filter-threshold
  • improve neural network by retraining with cleaner training data
  • add support to merge paired-end files with different length
  • fix a bug in start codon correction

1-2e0ef

5 years ago

Plass Release 1-2e0ef

Plass (Protein-Level ASSembler) is a software to assemble short read sequencing data on a protein level. The main purpose of Plass is the assembly of complex metagenomic datasets.

Features

  • support to assemble on multiple compute using MPI
  • Add --min-length flag to adjust codon extraction length

1-9c6a8

5 years ago

First Plass release