eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Two changes:
tsv-pretty
option --a|auto-preamble
- Enables automatic detection of preambles. Lines at the start of the file that should be printed as is, without reformatting into pretty printed columns. For more information and examples see PR #218.To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.3/tsv-utils-v1.4.3_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.3/tsv-utils-v1.4.3_osx-x86_64_ldc2.tar.gz | tar xz
One change:
dub.json
file. Needed to support planned changes in dub. Also needed for dlang CI pipelines.There are no changes to any of the tools.
To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.2/tsv-utils-v1.4.2_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.2/tsv-utils-v1.4.2_osx-x86_64_ldc2.tar.gz | tar xz
This release contains one new feature and several performance improvements:
tsv-uniq --number
- Line numbering grouped by key (new feature). The key is either the whole line or a subset of fields. Each unique key gets its own set of line numbers. See the tsv-uniq reference for details.std.stdio.File.byLine
. Especially effective for narrow files. Tools using byLine
(most of the tools) typically see a 10-40% performance gain, depending on tool and type of file (measured on OS X). Implementation documentation: tsv_utils.common.utils.bufferedByLine.tsv-join
that allocate large amounts of memory.To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.1/tsv-utils-v1.4.1_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.4.1/tsv-utils-v1.4.1_osx-x86_64_ldc2.tar.gz | tar xz
This release modifies tsv-sample
random value printing so most values are printed in decimal notation, without exponents. This is for subsequent processing by GNU sort. Sorting numbers with exponents requires "general numeric" order (option 'g'), which is much slower than "numeric" order (option 'n'). See Shuffling large files on the Tips and Tricks page for more info.
To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.2/tsv-utils-v1.3.2_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.2/tsv-utils-v1.3.2_osx-x86_64_ldc2.tar.gz | tar xz
In this release:
tsv-sample
: Adds full-line as key to distinct sampling. This completes the work that has been done on sampling over the last few point releases. tsv-sample
now supports a fair set of sampling modes. Performance is also good, in keeping with the tradition of the other tsv-utils tools.tsv-filter
. Unfortunately csv2tsv
is a little slower.To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.1/tsv-utils-v1.3.1_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.3.1/tsv-utils-v1.3.1_osx-x86_64_ldc2.tar.gz | tar xz
This release add several new sampling algorithms that improve runtime performance and memory utilization for a number of sampling use-cases. There are no new forms of sampling, just additional algorithms. The new algorithms:
Formal performance benchmarks have not been run. However, tests run on Mac OS as part of development show favorable results relative to other available tools, including GNU shuf.
To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.3/tsv-utils-v1.2.3_osx-x86_64_ldc2.tar.gz | tar xz
This release adds new capabilities and performance improvements to tsv-sample
. Documentation was also updated to improve clarity. Key changes:
-r|--replace
and -n|--num NUM
options.--gen-random-inorder
option. A related feature, --print-random
, was updated so that it is now supported by all applicable sampling modes.tsv-sample
use cases is line order randomization. The case where all input lines are being permuted was re-written and is now quite a bit faster and uses less memory. This applies to both weighted and unweighted sampling. (The case where a subsampling is being done via the -n|--num
option uses reservoir sampling was already fast.)-r|--rate
to -p|prob
. This was done to create a more consistent set of option names for new features and features that may be added in the future.To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.2/tsv-utils-v1.2.2_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.2/tsv-utils-v1.2.2_osx-x86_64_ldc2.tar.gz | tar xz
This release adds features for tsv-utils
automated tests. There are no changes to any of the tools.
The new testing features add support for different correct output results for different compiler/library versions. The main case is for changes to error message text, which in some cases includes text from the phobos library.
Alternate test outputs were added for a planned change to Phobos in an upcoming release. This was bundled into a tagged release to support the D language project tester where tsv-utils
is used.
To download and unpack the prebuilt binaries:
$ # Linux
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.1/tsv-utils-v1.2.1_linux-x86_64_ldc2.tar.gz | tar xz
$ # MacOS
$ curl -L https://github.com/eBay/tsv-utils/releases/download/v1.2.1/tsv-utils-v1.2.1_osx-x86_64_ldc2.tar.gz | tar xz
This release changes the repository name from eBay/tsv-utils-dlang
to eBay/tsv-utils
. This better reflects the functionality provided by the TSV Utilities. There are no other changes. Please report any issues found with the name change on the Issues page.
Release v1.1.20 contains a few minor updates:
tsv-summarize
: unique-count
operator - Performance improvement by avoiding unnecessary string copies. 40% faster on one benchmark.