A cross-platform and ultrafast toolkit for FASTA/Q file manipulation
Notice: I forgot to update the version number, so seqkit version
will return 2.8.0
.
OS | Arch | File, 中国镜像 | Download Count |
---|---|---|---|
Linux | 32-bit | seqkit_linux_386.tar.gz, 中国镜像 |
|
Linux | 64-bit | seqkit_linux_amd64.tar.gz, 中国镜像 |
|
Linux | arm64 | seqkit_linux_arm64.tar.gz, 中国镜像 |
|
macOS | 64-bit | seqkit_darwin_amd64.tar.gz, 中国镜像 |
|
macOS | arm64 | seqkit_darwin_arm64.tar.gz, 中国镜像 |
|
Windows | 32-bit | seqkit_windows_386.exe.tar.gz, 中国镜像 |
|
Windows | 64-bit | seqkit_windows_amd64.exe.tar.gz, 中国镜像 |
Notes
seqkit version
to check update !!!seqkit genautocomplete
to update shell autocompletion script !!!Please cite:
seqkit stats
:
N50_num
, an alias of L50, #15.seqkit seq/locate/fish/watch
:
-V/--validate-seq-length
. Now the whole sequence will be checked if -v/--validate-seq
is given.seqkit amplicon
:
seqkit seq
:
seqkit
:
seqkit grep
:
-D/--allow-duplicated-patterns
for outputting records multiple times when duplicated patterns are given. #427
seqkit subseq
:
--id-regexp
to create FASTA index file. This solves the panic happened for sequences containing tabs in the headers. #432
seqkit split/sort/shuffle
:
-2/--two-pass
), replace possible tabs in the sequence header.seqkit rmdup
:
-D/--dup-num-file
. #436
seqkit stats
:
-S/--skip-file-check
to skip input file checking when given files or a file list. It's very useful if you run it with millions of files.seqkit
:
seqkit seq
:
-v
to checking sequences.seqkit
:
-X
for the flag --infile-list
.seqkit common
:
-e/--check-embedded-seqs
for detecting embedded sequences.
seqkit stat
:
AvgQual
for average quality score. #411
seqkit split2
:
seqkit subseq
:
-R/--region-coord
for appending coordinates to sequence ID for -r/--region
. #413
seqkit locate
:
-s/--max-len-to-show
to show at most X characters for the search pattern or matched sequences.seqkit seq
:
seqkit merge-slides
: merge sliding windows generated from seqkit sliding. #390
seqkit stats
:
-N/--N
for appending other N50-like stats as new columns. #393
-T/--tabular
.seqkit translate
:
-s/--out-subseqs
and -m/--min-len
to write ORFs longer than x
amino acids as individual records. #389
seqkit sum
:
seqkit pair
:
seqkit seq
:
seqkit fxtab
:
-a/--alphabet
) with a new data structure. Thanks to @elliotwutingfeng #388
seqkit subseq
:
seqkit
:
seqkit locate
:
seqkit amplicon
:
seqkit split
:
--two-pass
. #332
seqkit stats
:
seqkit grep
:
-C
) for matching with mismatch (-m > 0
). #370
seqkit replace
:
seqkit grep
. #348
seqkit faidx
:
seqkit faidx/sort/shuffle/split/subseq
:
seqkit seq
:
seqkit rename
:
-s/--separator
for setting separator between original ID/name and the counter (default "_"). #360
-N/--start-num
for setting starting count number for duplicated IDs/names (default 2). #360
-1/--rename-1st-rec
for renaming the first record as well. #360
seqkit sliding
:
-S/--suffix
for change the suffix added to the sequence ID (default: "_sliding").