A cross-platform command-line tool for executing jobs in parallel
rush
is a tool similar to GNU parallel
and gargs.
rush
borrows some idea from them and has some unique features,
e.g.,
supporting custom defined variables,
resuming multi-line commands,
more advanced embeded replacement strings.
These features make rush
suitable for easily and flexibly parallelizing
complex workflows in fields like Bioinformatics (see examples 18).
Major:
--line-buffer
in GNU parallel)-t
). (--timeout
in GNU parallel)-r
). (--retry-failed --joblog
in GNU parallel)-c
). (--resume --joblog
in GNU parallel,
awk -v
like custom defined variables (-v
). (Using Shell variable in GNU parallel)-k
). (Same -k/--keep-order
in GNU parallel)-e
). (not perfect, you may stop it by typing ctrl-c or closing terminal) (--halt 2
in GNU parallel)-D
, default \n
). (--recstart
and --recend
in GNU parallel)-n
, default 1
). (-n/--max-args
in GNU parallel)-d
, default \s+
). (Same -d/--delimiter
in GNU parallel){#}
, job ID. (Same in GNU parallel){}
, full data. (Same in GNU parallel){n}
, n
th field in delimiter-delimited data. (Same in GNU parallel){/}
, dirname. ({//}
in GNU parallel){%}
, basename. ({/}
in GNU parallel){.}
, remove the last file extension. (Same in GNU parallel){:}
, remove all file extensions (Not directly supported in GNU parallel){^suffix}
, remove suffix
(Not directly supported in GNU parallel){@regexp}
, capture submatch using regular expression (Not directly supported in GNU parallel){%.}
, {%:}
, basename without extension{2.}
, {2/}
, {2%.}
, manipulate n
th fieldrush -v p={^suffix} 'echo {p}_new_suffix'
,
where {p}
is replaced with {^suffix}
. (Using Shell variable in GNU parallel)Minor:
--dry-run
). (Same in GNU parallel)--trim
). (Same in GNU parallel)--verbose
). (Same in GNU parallel)Differences between rush and GNU parallel on GNU parallel site.
Performance of rush
is similar to gargs
, and they are both slightly faster than parallel
(Perl) and both slower than Rust parallel
(discussion).
Note that speed is not the #.1 target, especially for processes that last long.
rush
is implemented in Go programming language,
executable binary files for most popular operating systems are freely available
in release page.
Tip: run rush -V
to check update !!!
OS | Arch | File, (中国镜像) | Download Count |
---|---|---|---|
Linux | 32-bit | rush_linux_386.tar.gz, (mirror) | |
Linux | 64-bit | rush_linux_amd64.tar.gz, (mirror) | |
Linux | arm64 | rush_linux_arm64.tar.gz, (mirror) | |
OS X | 64-bit | rush_darwin_amd64.tar.gz, (mirror) | |
OS X | arm64 | rush_darwin_arm64.tar.gz, (mirror) | |
Windows | 32-bit | rush_windows_386.exe.tar.gz, (mirror) | |
Windows | 64-bit | rush_windows_amd64.exe.tar.gz, (mirror) |
Just download compressed
executable file of your operating system,
and decompress it with tar -zxvf *.tar.gz
command or other tools.
And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp rush /usr/local/bin/
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp rush $HOME/bin/
For windows, just copy rush.exe
to C:\WINDOWS\system32
.
go install github.com/shenwei356/rush@latest
# download Go from https://go.dev/dl
wget https://go.dev/dl/go1.17.13.linux-amd64.tar.gz
tar -zxf go1.17.13.linux-amd64.tar.gz -C $HOME/
# or
# echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
# source ~/.bashrc
export PATH=$PATH:$HOME/go/bin
git clone https://github.com/shenwei356/rush
cd rush
go build
# or statically-linked binary
CGO_ENABLED=0 go build -tags netgo -ldflags '-w -s'
# or cross compile for other operating systems and architectures
CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 go build -tags netgo -ldflags '-w -s'
rush -- a cross-platform command-line tool for executing jobs in parallel
Version: 0.5.4
Author: Wei Shen <[email protected]>
Homepage: https://github.com/shenwei356/rush
Input:
- Input could be a list of strings or numbers, e.g., file paths.
- Input can be given either from the STDIN or file(s) via the option -i/--infile.
- Some options could be used to defined how the input records are parsed:
-d, --field-delimiter field delimiter in records (default "\s+")
-D, --record-delimiter record delimiter (default "\n")
-n, --nrecords number of records sent to a command (default 1)
-J, --records-join-sep record separator for joining multi-records (default "\n")
-T, --trim trim white space (" \t\r\n") in input
Output:
- Outputs of all commands are written to STDOUT by default,
you can also use -o/--out-file to specify a output file.
- Outputs of all commands are random, you can use the flag -k/--kep-order
to keep output in order of input.
- Outputs of all commands are buffered, you can use the flag -I/--immediate-output
to print output immediately and interleaved.
Replacement strings in commands:
{} full data
{#} job ID
{n} nth field in delimiter-delimited data
{/} dirname
{%} basename
{.} remove the last file extension
{:} remove all file extensions.
{^suffix} remove suffix
{@regexp} capture submatch using regular expression
Combinations:
{%.}, {%:} basename without extension
{2.}, {2/}, {2%.} manipulate nth field
Preset variable (macro):
1. You can pass variables to the command like awk via the option -v. E.g.,
$ seq 3 | rush -v p=prefix_ -v s=_suffix 'echo {p}{}{s}'
prefix_3_suffix
prefix_1_suffix
prefix_2_suffix
2. The value could also contain replacement strings.
# {p} will be replaced with {%:}, which computes the basename and remove all file extensions.
$ echo a/b/c.txt.gz | rush -v 'p={%:}' 'echo {p} {p}.csv'
c c.csv
Usage:
rush [flags] [command]
Examples:
1. simple run, quoting is not necessary
$ seq 1 10 | rush echo {}
2. keep order
$ seq 1 10 | rush 'echo {}' -k
3. timeout
$ seq 1 | rush 'sleep 2; echo {}' -t 1
4. retry
$ seq 1 | rush 'python script.py' -r 3
5. dirname & basename & remove suffix
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
dir file.txt.gz dir/file
6. basename without the last or any extension
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
dir.d/file.txt dir.d/file file.txt file
7. job ID, combine fields and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
job 1: file.txt file s
8. capture submatch using regular expression
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
read
9. custom field delimiter
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
a b c
10. custom record delimiter
$ echo a=b=c | rush -D "=" -k 'echo {}'
a
b
c
$ echo abc | rush -D "" -k 'echo {}'
a
b
c
11. assign value to variable, like "awk -v"
# seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
Hello, Wei Shen!
12. preset variable (Macro)
# equal to: echo sample_1.fq.gz | rush 'echo {:^_1} {} {:^_1}_2.fq.gz'
$ echo sample_1.fq.gz | rush -v p={:^_1} 'echo {p} {} {p}_2.fq.gz'
sample sample_1.fq.gz sample_2.fq.gz
13. save successful commands to continue in NEXT run
$ seq 1 3 | rush 'sleep {}; echo {}' -c -t 2
[INFO] ignore cmd #1: sleep 1; echo 1
[ERRO] run cmd #1: sleep 2; echo 2: time out
[ERRO] run cmd #2: sleep 3; echo 3: time out
14. escape special symbols
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
a
15. run a command with relative paths in Windows, please use backslash as the separator.
# "brename -l -R" is used to search paths recursively
$ brename -l -q -R -i -p "\.go$" | rush "bin\app.exe {}"
More examples: https://github.com/shenwei356/rush
Flags:
-v, --assign strings assign the value val to the variable var (format: var=val, val also
supports replacement strings)
--cleanup-time int time to allow child processes to clean up between stop / kill signals
(unit: seconds, 0 for no time) (default 3) (default 3)
-c, --continue continue jobs. NOTES: 1) successful commands are saved in file (given
by flag -C/--succ-cmd-file); 2) if the file does not exist, rush saves
data so we can continue jobs next time; 3) if the file exists, rush
ignores jobs in it and update the file
--dry-run print command but not run
-q, --escape escape special symbols like $ which you can customize by flag
-Q/--escape-symbols
-Q, --escape-symbols string symbols to escape (default "$#&`")
--eta show ETA progress bar
-d, --field-delimiter string field delimiter in records, support regular expression (default "\\s+")
-h, --help help for rush
-I, --immediate-output print output immediately and interleaved, to aid debugging
-i, --infile strings input data file, multi-values supported
-j, --jobs int run n jobs in parallel (default value depends on your device) (default 16)
-k, --keep-order keep output in order of input
--no-kill-exes strings exe names to exclude from kill signal, example: mspdbsrv.exe; or use
all for all exes (default none)
--no-stop-exes strings exe names to exclude from stop signal, example: mspdbsrv.exe; or use
all for all exes (default none)
-n, --nrecords int number of records sent to a command (default 1)
-o, --out-file string out file ("-" for stdout) (default "-")
--print-retry-output print output from retry commands (default true)
--propagate-exit-status propagate child exit status up to the exit status of rush (default true)
-D, --record-delimiter string record delimiter (default is "\n") (default "\n")
-J, --records-join-sep string record separator for joining multi-records (default is "\n") (default "\n")
-r, --retries int maximum retries (default 0)
--retry-interval int retry interval (unit: second) (default 0)
-e, --stop-on-error stop child processes on first error (not perfect, you may stop it by
typing ctrl-c or closing terminal)
-C, --succ-cmd-file string file for saving successful commands (default "successful_cmds.rush")
-t, --timeout int timeout of a command (unit: seconds, 0 for no timeout) (default 0)
-T, --trim string trim white space (" \t\r\n") in input (available values: "l" for left,
"r" for right, "lr", "rl", "b" for both side)
--verbose print verbose information
-V, --version print version information and check for update
Simple run, quoting is not necessary
# seq 1 3 | rush 'echo {}'
$ seq 1 3 | rush echo {}
3
1
2
Read data from file (-i
)
$ rush echo {} -i data1.txt -i data2.txt
Keep output order (-k
)
$ seq 1 3 | rush 'echo {}' -k
1
2
3
Timeout (-t
)
$ time seq 1 | rush 'sleep 2; echo {}' -t 1
[ERRO] run command #1: sleep 2; echo 1: time out
real 0m1.010s
user 0m0.005s
sys 0m0.007s
Retry (-r
)
$ seq 1 | rush 'python unexisted_script.py' -r 1
python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
[WARN] wait command: python unexisted_script.py: exit status 2
python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
[ERRO] wait command: python unexisted_script.py: exit status 2
Dirname ({/}
) and basename ({%}
) and remove custom suffix ({^suffix}
)
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
dir file_1.txt.gz dir/file
Get basename, and remove last ({.}
) or any ({:}
) extension
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
dir.d/file.txt dir.d/file file.txt file
Job ID, combine fields index and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
job 1: file.txt file s
Capture submatch using regular expression ({@regexp}
)
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
Custom field delimiter (-d
)
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
a b c
Send multi-lines to every command (-n
)
$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1
2
3
4
5
# Multiple records are joined with separator `"\n"` (`-J/--records-join-sep`)
$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1 2
3 4
5
$ seq 5 | rush -n 2 -k -j 3 'echo {1}'
1
3
5
Custom record delimiter (-D
), note that empty records are not used.
$ echo a b c d | rush -D " " -k 'echo {}'
a
b
c
d
$ echo abcd | rush -D "" -k 'echo {}'
a
b
c
d
# FASTA format
$ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC"
>seq1
actg
>seq2
AAAA
>seq3
CCCC
$ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC" | rush -D ">" 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
FASTA record 1: name: seq1 sequence: actg
FASTA record 2: name: seq2 sequence: AAAA
FASTA record 3: name: seq3 sequence: CCCC
Assign value to variable, like awk -v
(-v
)
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
Hello, Wei Shen!
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
Hello, Wei Shen!
$ for var in a b; do \
$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
$ done
var: a, data: 1
var: a, data: 2
var: a, data: 3
var: b, data: 1
var: b, data: 2
var: b, data: 3
Preset variable (-v
), avoid repeatedly writing verbose replacement strings
# naive way
$ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
read read_2.fq.gz
# macro + removing suffix
$ echo read_1.fq.gz | rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
# macro + regular expression
$ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
Escape special symbols
$ seq 1 | rush 'echo "I have $100"'
I have 00
$ seq 1 | rush 'echo "I have $100"' -q
I have $100
$ seq 1 | rush 'echo "I have $100"' -q --dry-run
echo "I have \$100"
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"'
a b
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
a
Interrupt jobs by Ctrl-C
, rush will stop unfinished commands and exit.
$ seq 1 20 | rush 'sleep 1; echo {}'
^C[CRIT] received an interrupt, stopping unfinished commands...
[ERRO] wait cmd #7: sleep 1; echo 7: signal: interrupt
[ERRO] wait cmd #5: sleep 1; echo 5: signal: killed
[ERRO] wait cmd #6: sleep 1; echo 6: signal: killed
[ERRO] wait cmd #8: sleep 1; echo 8: signal: killed
[ERRO] wait cmd #9: sleep 1; echo 9: signal: killed
1
3
4
2
Continue/resume jobs (-c
). When some jobs failed (by execution failure, timeout,
or cancelling by user with Ctrl + C
),
please switch flag -c/--continue
on and run again,
so that rush
can save successful commands and ignore them in NEXT run.
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1
2
[ERRO] run cmd #3: sleep 3; echo 3: time out
# successful commands:
$ cat successful_cmds.rush
sleep 1; echo 1__CMD__
sleep 2; echo 2__CMD__
# run again
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
[INFO] ignore cmd #1: sleep 1; echo 1
[INFO] ignore cmd #2: sleep 2; echo 2
[ERRO] run cmd #1: sleep 3; echo 3: time out
Commands of multi-lines (Not supported in GNU parallel)
$ seq 1 3 | rush 'sleep {}; echo {}; \
echo finish {}' -t 3 -c -C finished.rush
1
finish 1
2
finish 2
[ERRO] run cmd #3: sleep 3; echo 3; \
echo finish 3: time out
$ cat finished.rush
sleep 1; echo 1; \
echo finish 1__CMD__
sleep 2; echo 2; \
echo finish 2__CMD__
# run again
$ seq 1 3 | rush 'sleep {}; echo {}; \
echo finish {}' -t 3 -c -C finished.rush
[INFO] ignore cmd #1: sleep 1; echo 1; \
echo finish 1
[INFO] ignore cmd #2: sleep 2; echo 2; \
echo finish 2
[ERRO] run cmd #1: sleep 3; echo 3; \
echo finish 3: time out
Commands are saved to file (-C
) right after it finished, so we can view
the check finished jobs:
grep -c __CMD__ successful_cmds.rush
A comprehensive example: downloading 1K+ pages given by three URL list files
using phantomjs save_page.js
(some page contents are dynamicly generated by Javascript,
so wget
does not work). Here I set max jobs number (-j
) as 20
,
each job has a max running time (-t
) of 60
seconds and 3
retry changes
(-r
). Continue flag -c
is also switched on, so we can continue unfinished
jobs. Luckily, it's accomplished in one run :smile:
$ for f in $(seq 2014 2016); do \
$ /bin/rm -rf $f; mkdir -p $f; \
$ cat $f.html.txt | rush -v d=$f -d = 'phantomjs save_page.js "{}" > {d}/{3}.html' -j 20 -t 60 -r 3 -c; \
$ done
A bioinformatics example: mapping with bwa
, and processing result with samtools
:
$ tree raw.cluster.clean.mapping
raw.cluster.clean.mapping
├── M1
│ ├── M1_1.fq.gz -> ../../raw.cluster.clean/M1/M1_1.fq.gz
│ ├── M1_2.fq.gz -> ../../raw.cluster.clean/M1/M1_2.fq.gz
...
$ ref=ref/xxx.fa
$ threads=25
$ ls -d raw.cluster.clean.mapping/* \
| rush -v ref=$ref -v j=$threads \
'bwa mem -t {j} -M -a {ref} {}/{%}_1.fq.gz {}/{%}_2.fq.gz > {}/{%}.sam; \
samtools view -bS {}/{%}.sam > {}/{%}.bam; \
samtools sort -T {}/{%}.tmp -@ {j} {}/{%}.bam -o {}/{%}.sorted.bam; \
samtools index {}/{%}.sorted.bam; \
samtools flagstat {}/{%}.sorted.bam > {}/{%}.sorted.bam.flagstat; \
/bin/rm {}/{%}.bam {}/{%}.sam;' \
-j 2 --verbose -c -C mapping.rush
Since {}/{%}
appears many times, we can use preset variable (macro) to
simplify it:
$ ls -d raw.cluster.clean.mapping/* \
| rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \
samtools view -bS {p}.sam > {p}.bam; \
samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
samtools index {p}.sorted.bam; \
samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
/bin/rm {p}.bam {p}.sam;' \
-j 2 --verbose -c -C mapping.rush
Shell grep
returns exit code 1
when no matches found.
rush
thinks it failed to run.
Please use grep foo bar || true
instead of grep foo bar
.
$ seq 1 | rush 'echo abc | grep 123'
[ERRO] wait cmd #1: echo abc | grep 123: exit status 1
$ seq 1 | rush 'echo abc | grep 123 || true'
Main contributors:
Specially thank @brentp
and his gargs, from which rush
borrows
some ideas.
Thank @bburgin for his contribution on improvement of child process management.
Create an issue to report bugs, propose new functions or ask for help.