zsv+lib: world's fastest (simd) CSV parser, bare metal or wasm, with an extensible CLI for SQL querying, format conversion and more
zsv+lib is a fast CSV parser library and extensible command-line utility. It achieves high performance using SIMD operations, efficient memory use and other optimization techniques, and can also parse generic-delimited and fixed-width formats, as well as multi-row-span headers
The ZSV CLI can be compiled to virtually any target, including web assembly, and offers features including select
, count
, direct CSV sql
, flatten
, serialize
, 2json
conversion, 2db
sqlite3 conversion, stack
, pretty
, 2tsv
, compare
, paste
and more.
Pre-built CLI packages are available via brew and nuget
A pre-built library package is available for Node (npm install zsv-lib
). Please note, this package
is still in alpha and currently only exposes a small subset of the zsv library capabilities. More
to come
If you like zsv+lib, do not forget to give it a star! 🌟
Preliminary performance results compare favorably vs other CSV utilities (xsv
,
tsv-utils
, csvkit
, mlr
(miller) etc). Below were results on a pre-M1 macOS
MBA; on most platforms zsvlib was 2x faster, though in some cases the advantage
was smaller e.g. 15-25%) (below, mlr not shown as it was about 25x slower):
** See 12/19 update re M1 processor at https://github.com/liquidaty/zsv/blob/main/app/benchmark/README.md
"CSV" is an ambiguous term. This library uses the same definition as Excel. In addition, it provides a row-level (as well as cell-level) API and provides "normalized" CSV output (e.g. input of this"iscell1,"thisis,"cell2
becomes "this""iscell1","thisis,cell2"
). Each of these three objectives (Excel compatibility, row-level API and normalized output) has a measurable performance impact; conversely, it is possible to achieve-- which a number of other CSV parsers do-- much faster parsing speeds if any of these requirements (especially Excel compatibility) are dropped.
zsv
is an extensible CSV utility, which uses zsvlib, for tasks such as slicing
and dicing, querying with SQL, combining, serializing, flattening,
converting between CSV/JSON/sqlite3 and more.
zsv
is streamlined for easy development of custom dynamic extensions.
zsvlib and zsv
are written in C, but since zsvlib is a library, and zsv
extensions are just shared libraries, you can extend zsv
with your own code in
any programming language, so long as it has been compiled into a shared library
that implements the expected
interface.
Available as BOTH a library and an application (coming soon: standalone zsvutil library for common helper functions such as csv writer)
Open-source, permissively licensed
Handles real-world CSV the same way that spreadsheet programs do (including edge cases). Gracefully handles (and can "clean") real-world data that may be "dirty"
Runs on macOS (tested on clang/gcc), Linux (gcc), Windows (mingw), BSD (gcc-only) and in-browser (emscripten/wasm)
Fastest (at least, vs all alternatives and on all platforms we've benchmarked where 256-bit SIMD operations are available). See app/benchmark/README.md
Low memory usage (regardless of how big your data is) and size footprint for both lib (~20k) and CLI executable (< 1MB)
Handles general delimited data (e.g. pipe-delimited) and fixed-with input (with specified widths or auto-detected widths)
Handles multi-row headers
Handles input from any stream, including caller-defined streams accessed via
a single caller-defined fread
-like function
Easy to use as a library in a few lines of code, via either pull or push parsing
Includes the zsv
CLI with the following built-in commands:
select
, count
, sql
query, desc
ribe, flatten
, serialize
, 2json
,
2db
, stack
, pretty
, 2tsv
, paste
, compare
, jq
, prop
, rm
CLI is easy to extend/customize with a few lines of code via modular plug-in framework. Just write a few custom functions and compile into a distributable DLL that any existing zsv installation can use
zsvlib and zsv
are permissively licensed
Download pre-built binaries and packages for macOS, Windows, Linux and BSD from the Releases page.
You can also download pre-built binaries and packages from Actions for the latest commits and PRs but these are retained only for limited days.
...via Homebrew:
brew tap liquidaty/zsv
brew install zsv
...via MacPorts:
sudo port install zsv
For Linux (Debian/Ubuntu - *.deb
):
# Install
sudo apt install ./zsv-amd64-linux-gcc.deb
# Uninstall
sudo apt remove zsv
For Linux (RHEL/CentOS - *.rpm
):
# Install
sudo yum install ./zsv-amd64-linux-gcc.rpm
# Uninstall
sudo yum remove zsv
For Windows (*.nupkg
), install with nuget.exe
:
# Install via nuget custom feed (requires absolutes paths)
md nuget-feed
nuget.exe add zsv .\<path>\zsv-amd64-windows-mingw.nupkg -source <path>/nuget-feed
nuget.exe install zsv -version <version> -source <path>/nuget-feed
# Uninstall
nuget.exe delete zsv <version> -source <path>/nuget-feed
For Windows (*.nupkg
), install with choco.exe
:
# Install
choco.exe install zsv --pre -source .\zsv-amd64-windows-mingw.nupkg
# Uninstall
choco.exe uninstall zsv
The zsv parser library is available for node:
npm install zsv-lib
Please note:
See BUILD.md for more details.
Our objectives, which we were unable to find in a pre-existing project, are:
\n
or \r
), embedded
newlines, abnormal quoting (e.g. aaa"aaa,bbb...)There are several excellent tools that achieve high performance. Among those we considered were xsv and tsv-utils. While they met our performance objective, both were designed primarily as a utility and not a library, and were not easy enough, for our needs, to customize and/or to support modular customizations that could be maintained (or licensed) independently of the related project (in addition to the fact that they were written in Rust and D, respectively, which happen to be languages with which we lacked deep experience, especially for web assembly targeting).
Others we considered were Miller (mlr), csvkit and Go (csv module), which did not meet our performance objective. We also considered various other libraries using SIMD for CSV parsing, but none that we tried met the "real-world CSV" objective.
Hence zsv was created as a library and a versatile application, both optimized for speed and ease of development for extending and/or customizing to your needs
zsv
comes with several built-in commands:
echo
: read CSV from stdin and write it back out to stdout. This is mostly
useful for demonstrating how to use the API and also how to create a plug-in,
and has some limited utility beyond that e.g. for adding/removing the UTF8
BOM, or cleaning up bad UTF8select
: re-shape CSV by skipping leading garbage, combining header rows into
a single header, selecting or excluding specified columns, removing duplicate
columns, sampling, searching and moresql
: run ad-hoc SQL query on a CSV filedesc
: provide a quick description of your table datapretty
: format for console (fixed-width) display, or convert to markdown
format2json
: convert CSV to JSON. Optionally, output in database schema
2tsv
: convert CSV to TSVcompare
: compare two or more tables of data and output the differencespaste
(alpha): horizontally paste two tables together (given inputs X and Y,
output 1...N rows where each row all columns of X in row N, followed by all columns of Y in row N)serialize
(inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))
table with columns: Row, Column Name, Column Valueflatten
(inverse of serialize): flatten a table by combining rows that share
a common value in a specified identifier columnstack
: merge CSV files verticallyjq
: run a jq filter2db
: convert from JSON to sqlite3 db
prop
: view or save parsing options associated with a file, such as initial
rows to ignore, or header row span. Saved options are be applied by
default when processing that fileEach of these can also be built as an independent executable named zsv_xxx
where xxx
is the command name.
After installing, run zsv help
to see usage details. The typical syntax is
zsv <command> <parameters>
e.g.
zsv sql my_population_data.csv "select * from data where population > 100000"
Full application code examples can be found at examples/lib/README.md.
An example of using the API, compiled to wasm and called via Javascript, is in examples/js/README.md.
For more sophisticated (but at this time, only sporadically commented/documented) use cases, see the various CLI C source files in the app/ directory such as app/serialize.c
You can extend zsv
by providing a pre-compiled shared or static library that
defines the functions specified in extension_template.h
and which zsv
loads
in one of three ways:
zsv
executable and loaded at runtime if/as/when the custom mode is invokedYou can build and run a sample extension by running make test
from
app/ext_example
.
The easiest way to implement your own extension is to copy and customize the template files in app/ext_template
This release does not yet implement the full range of core features that are planned for implementation prior to beta release. If you are interested in helping, please post an issue.
main
branch.main
.