Automatically exported from code.google.com/p/cdhit
This version supports .gz input files
CD-HIT-OTU-MiSeq is included as an use case, for clustering 16S rDNA MiSeq paired end reads. Minor fix made for CD-HIT-OTU-MiSeq in cd-hit-v4.6.8-2017-1208-source.tar.gz
cd-hit-est and cd-hit-est-2d now can cluster paired end (PE) reads. user can select sub-sequence from the beginning of the sequences for clustering. psi-cd-hit.pl can work with blast+. output cluster file can be sorted by cluster size, in addition to cluster length, which is still the default output fasta file can be sorted by cluster size.
Bug fix for cd-hit-dup in variable length input when write out R1 reads
Add filter for -aL option so that short sequences will be skipped if not satisfy representative sequences' -aL requirement. This will make compute faster in clustering settings where sequences in the same cluster are required to have similar length using -aL -AL option (e.g. -aL 0.9).
Major update of cd-hit-dup output format: previous cd-hit-dup output trimmed reads, or merged reads of (R1 and R2) for paired end (PE) reads. The output are not really useful for later analysis. In this version, the full length reads are in the output. For PE reads, both full length reads are in the output files.
A few bug fix Updated documents Makefile, openmp as default Support for negative value of the -T option Add a minor improvement for supporting long sequences Add new psi-cd-hit-2013-0525, which was developed separately, back to cd-hit cd-hit-auxtools-v0.5-2012-03-07-fix was added to cd-hit
A few bug fix Support for negative value of the -T option Add a minor improvement for supporting long sequences Add new psi-cd-hit-2013-0525, which was developed separately, back to cd-hit cd-hit-auxtools-v0.5-2012-03-07-fix was added to cd-hit
CD-HIT-V4.6.1 (2012-08-27): Fix: a minor bug in handling masking letters; Add: a few minor changes and fixings to conform to debian packaging rules.