Scalable long read self-correction and assembly polishing with multiple sequence alignment
Fixes compatibility issues between GNU sort and C++ string comparisons
Adds a script for cleaning the current build
Avoids the explode/merge pre-processing steps if Minimap2 index was not split
Avoids the explode/merge pre-processing steps for assembly polishing, and directly sorts the overlaps file
Replaces Python script with a C++ subprogram for PAF reformatting
Accepts both FASTA and FASTQ as input.
No longer requires "one sequence per line" format.
Fixes bug when running multiple instances in the same directory.
Replaces POA with SPOA to compute multiple sequence alignment. This allows a major speedup while producing slightly more accurate results.
Replaces classical maps with robin_hood maps. This also allows a major speedup.
Overall, CONSENT is now about 2 times faster.
Removes PAF-index and manually explode/merge the overlaps file. This allows to drastically reduce the runtime of the correction / polishing step.
Improves the max best overlaps selection for the correction of each read. This also allows to reduce the runtime of the correction / polishing step.
Performs multithreading differently for read correction and assembly polishing. This, once again, allows to greatly reduce the runtime of the correction / polishing process.
Fixes various bugs, updates default parameters, and provides a major code clean-up.
Corrected bugs with the reanchoring step of the corrected windows to the reads. This allows to correct larger chunks of the long reads.
Now only processes the MAX (150 by default) best overlaps for each reads. This allows to drastically reduce the runtime of the whole (correction AND polishing) pipeline, while keeping high quality results.
Now processes alignment windows in parallel instead of whole reads. This allows to drastically reduce the runtime of the assembly polishing pipeline.
Fix bug with headers containing spaces
Fix bug with alignment piles retrieval in large alignment files.
Add support for PAF-index.
Allows to split Minimap2 index, and thus drastically reduce the memory consumption of the overlapping step.
First release of CONSENT!
CONSENT is a scalable self-correction method for long reads.