memory efficient, fast & precise taxnomomic classification system for metagenomic read mapping
The default setting works a bit smarter now, it first tries to find NCBI-style accession or accession.version identifiers, then genbank identifiers and finally uses the filename (without path and extension).
The new command line option -sequence-id-format <type>
allows the user to select a preferred method for sequence id extraction.
Available values for <type>
are:
smart
: (default), works as described abovencbi
: only use NCBI-style accession or accession.version identifiersgenbank
: only use genbank identifiersfilename
: only use filename (without path and extension)leadingword
: only use first contiguous stretch of non-whitespace charactersfixed abundance table formatting
If a reference sequence is inserted, whose ID (e.g. NCBI accession) is already present in the database, the newer sequence will now be inserted with a modified ID (an exclamation mark + duplication counter will be appended) and a warning will be printed to stderr.
A minimum and maximum length for reads can now be set with -min-readlen <#>
and -max-readlen <#>
. Reads with lengths outside of this range will not be processed, i.e., treated as if they were not present in the input file. How many reads were discarded and how many were processed is printed to stderr. The default behavior, that all reads will be processed, remains unchanged.
assembly_summary
files with inconsistent headersuint64_t
for MC_TARGET_ID_TYPE
/ MC_WINDOW_ID_TYPE
/ DMC_KMER_TYPE
Improved merge mode: