Qsv Versions Save

CSVs sliced, diced & analyzed.

0.127.0

2 weeks ago

📊 Enhanced Frequency Analysis 📊

This a quick release adding several frequency enhancements for more detailed frequency analysis. The frequency command now includes a percentage column, calculates other values, and supports limiting unique counts and negative limits. These options provides additional context for Datapusher+, qsv-pro and describegpt so their metadata inferences are more accurate and comprehensive.

Previously, for a 775-row CSV file containing one column named state with entries for all 50 states, frequency only showed[^1]:

qsv frequency freq_state_example.csv | qsv table
field  value  count
state  NY     100
state  NJ     70
state  CA     60
state  MA     55
state  FL     45
state  TX     43
state  NM     40
state  AZ     39
state  NV     38
state  MI     35

Now, there's a new percentage column and other values calculation, both of which have configurable options:

qsv frequency freq_state_example.csv | qsv table
field  value       count  percentage
state  NY          100    12.90323
state  NJ          70     9.03226
state  CA          60     7.74194
state  MA          55     7.09677
state  FL          45     5.80645
state  TX          43     5.54839
state  NM          40     5.16129
state  AZ          39     5.03226
state  NV          38     4.90323
state  MI          35     4.51613
state  Other (40)  250    32.25806

This release is also out of cycle to address a big performance regression in the excel command caused by unnecessary formula info retrieval for the --error-format option introduced in 0.126.0. This has been fixed, and the excel command is now back to its speedy self.

Added

frequency: added percentage column; other values calculation, implementing https://github.com/jqnatividad/qsv/issues/1774 https://github.com/jqnatividad/qsv/pull/1775
benchmarks: added new frequency and excel benchmarks https://github.com/jqnatividad/qsv/commit/b83ad3aae1cdf9a1750201cbf9b3ccd4ac3a4192

Changed

contrib(bashly): update completions.bash for qsv v0.126.0 by @rzmk in https://github.com/jqnatividad/qsv/pull/1771
build(deps): bump mimalloc from 0.1.39 to 0.1.41 by @dependabot in https://github.com/jqnatividad/qsv/pull/1772
build(deps): bump qsv-stats from 0.14.0 to 0.15.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1773
updated several indirect dependencies
applied select clippy recommendations

Fixed

excel: fixed performance regression because qsv was unnecessarily getting formula info (an expensive operation) for --error-format option even when not required https://github.com/jqnatividad/qsv/commit/772af3420c44c864e06cd2cb61606900bff17947
renamed 0.126.0 sqlp_vs_duckdb benchmark results so they're next to each other for easy direct comparison. https://github.com/jqnatividad/qsv/commit/7bcd59e301965b9e8737a9230d1236e8d34ab4bf.
Per the benchmarks, sqlp is 2.87 times faster than duckdb v0.10.2 for a simple aggregation (0.066 secs vs 0.19 secs), and 1.42 times faster for an "expensive" aggregation (0.143 secs vs 0.203 secs).

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.126.0...0.127.0

[^1]: with its default --limit setting of 10 only show the top 10 unique values in the column, sorted by occurence

0.126.0

3 weeks ago

🤖 Expanded Metadata Inferencing 🤖

describegpt headlines this release, with its new ability to support other local Large Language Models (LLMs) using popular tools that serve them through APIs such as Ollama and Jan. This broadens the tool's utility in diverse AI environments. Beyond OpenAI, qsv can now use other popular LLMs like Llama 3, Mistral, and Gemma. It also unlocks expanded metadata inferencing capabilities in qsv pro.

Several commands got additional options: cat with --no-headers support in the rowskey subcommand; excel with new options like --error-format and short --metadata mode; and foreach with a --dry-run option. frequency also got new options, including --unq-limit for limiting unique counts, support for negative limits, and a --lmt-threshold option for compiling comprehensive frequencies below a threshold. slice now supports negative indices and new JSON output options, providing more flexibility in data slicing.

This is all rounded out with sqlp improvements, including support for single-line comments in SQL scripts and a special SKIP_INPUT value to skip input preprocessing when using table functions directly in Polars SQL (e.g. read_csv() and read_parquet()) - all while increasing performance thanks to the Polars engine being upgraded to 0.39.2.

New Features

cat: Added --no-headers support to the rowskey subcommand.
describegpt: Added compatibility for other local Large Language Models (LLMs) such as Ollama and Jan, broadening the tool's utility in diverse AI environments.
excel: Introduced new options in the excel command: --error-format for better error handling and a short --metadata JSON mode.
foreach: added a --dry-run option, allowing users to preview the results of scripts without executing them.
frequency: New options added such as --unq-limit for limiting unique counts; support for negative limits to only show frequencies >= abs(negative limit); and a --lmt-threshold option to allow the compilation of comprehensive frequencies below the threshold - all providing more detailed control over frequency analysis.
slice: Support for negative indices to slice from the end and new JSON output options.
sqlp: sqlp now supports single-line comments and includes a special SKIP_INPUT value for more efficient data loading. The Polars engine has also been upgraded to 0.39.2, providing enhanced performance and stability.

Changes and Optimizations

Performance Enhancements: Microoptimizations in datefmt and validate commands, and increased default length for --infer-len in sqlp for improved performance.
Dependency Updates: Numerous updates including bumping Luau, jql-runner, pyo3, and other dependencies to enhance stability and security.
Benchmarks Added: New performance benchmarks for sqlp vs duckdb added to ensure there are no performance regressions between releases. Right now, sqlp is faster than duckdb in most cases (thanks to Polars - see the latest TPC-H benchmarks), but we want to make sure that we keep it that way.

Security and Robustness

Security Fixes: Updated rustls to fix a specific CVE, and other minor fixes to enhance the security and robustness of network and data processing features.
Bug Fixes: Various bug fixes including improvements in error formatting in excel and robustness in fetch and fetchpost commands.

Added

cat: add --no-headers support to rowskey subcommand https://github.com/jqnatividad/qsv/pull/1762
describegpt: add compatibility for other (local) LLMs (Ollama, Jan, etc.) by @rzmk in https://github.com/jqnatividad/qsv/pull/1761
excel: add --error-format option https://github.com/jqnatividad/qsv/pull/1721
excel: add --metadata short JSON mode https://github.com/jqnatividad/qsv/pull/1738
foreach: add --dry-run option https://github.com/jqnatividad/qsv/pull/1740
frequency: add --unq-limit option https://github.com/jqnatividad/qsv/pull/1763
frequency: add support for negative --limits https://github.com/jqnatividad/qsv/pull/1765
frequency: add --lmt-threshold option https://github.com/jqnatividad/qsv/pull/1766
slice: add support for negative --index option values https://github.com/jqnatividad/qsv/pull/1726
slice: implement --json output option https://github.com/jqnatividad/qsv/pull/1729
sqlp: added support for single-line comments in SQL scripts https://github.com/jqnatividad/qsv/commit/bb52bcee61d8ea980a2ab093315ead0c153517a5
sqlp: added SKIP_INPUT special value to short-circuit input processing if the user wants to load input files directly using table functions (e.g. read_csv(), read_parquet(), etc.) https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476
validate: add --valid-output option https://github.com/jqnatividad/qsv/pull/1730
contrib: add sample Bashly completions implementation by @rzmk in https://github.com/jqnatividad/qsv/pull/1731
benchmarks: added sqlp vs duckdb benchmarks.

Changed

datefmt: microoptimize formatting https://github.com/jqnatividad/qsv/commit/0ee27e768fdc08b7381094842d22b45940fd0a26
joinp: adapt to breaking change in Polars 0.39 for lazyframe sort https://github.com/jqnatividad/qsv/commit/c625ca9f5aef59c736a837aaa4eeda7688403c37
sqlp: change --infer-len option default from 250 to 1000 for increased performance https://github.com/jqnatividad/qsv/commit/da1d215d803f8bfe400a7202feeecb8ae14239e9
validate: microoptimize to_json_instance() https://github.com/jqnatividad/qsv/commit/c2e4a1c696300eea04cccacca33f6872622ec086
bump Luau from 0.616 to 0.622 https://github.com/jqnatividad/qsv/commit/9216ec3a53767379662657f69c0076f4a52caaff
build(deps): bump jql-runner from 7.1.6 to 7.1.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1711
build(deps): bump pyo3 from 0.21.0 to 0.21.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1712
build(deps): bump pyo3 from 0.21.1 to 0.21.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1750
build(deps): bump strsim from 0.11.0 to 0.11.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1715
build(deps): bump sysinfo from 0.30.7 to 0.30.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1716
build(deps): bump sysinfo from 0.30.8 to 0.30.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1732
build(deps): bump sysinfo from 0.30.9 to 0.30.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1735
build(deps): bump sysinfo from 0.30.10 to 0.30.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1755
build(deps): bump redis from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1720
build(deps): bump mlua from 0.9.6 to 0.9.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1724
build(deps): bump reqwest from 0.12.2 to 0.12.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1725
build(deps): bump reqwest from 0.12.3 to 0.12.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1759
build(deps): bump anyhow from 1.0.81 to 1.0.82 by @dependabot in https://github.com/jqnatividad/qsv/pull/1733
build(deps): bump robinraju/release-downloader from 1.9 to 1.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1734
build(deps): bump chrono from 0.4.37 to 0.4.38 by @dependabot in https://github.com/jqnatividad/qsv/pull/1744
bump polars from 0.38 to 0.39 https://github.com/jqnatividad/qsv/pull/1745
build(deps): bump polars from 0.39.0 to 0.39.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1746
build(deps): bump polars from 0.39.1 to 0.39.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1752
build(deps): bump qsv-dateparser from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1747
build(deps): bump serde_json from 1.0.115 to 1.0.116 by @dependabot in https://github.com/jqnatividad/qsv/pull/1749
build(deps): bump serde from 1.0.197 to 1.0.198 by @dependabot in https://github.com/jqnatividad/qsv/pull/1751
build(deps): bump rustls from 0.22.3 to 0.22.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1758
build(deps): bump simple-expand-tilde from 0.1.4 to 0.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1767
build(deps): bump serial_test from 3.0.0 to 3.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1768
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1769
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands
pin Rust nightly to 2024-04-15 - the same nightly that Polars 0.39 is pinned to
bumped MSRV to 1.77.2

Fixed

Make init_logger more robust https://github.com/jqnatividad/qsv/pull/1717
count: empty CSVs count as zero also for polars. Fixes #1741 https://github.com/jqnatividad/qsv/pull/1742
excel: fix #1682 by adding --error-format option https://github.com/jqnatividad/qsv/issues/1689
fetch & fetchpost: more robust JSON response validation https://github.com/jqnatividad/qsv/commit/ebc7287cd929cc23629ee53c7d82e0b8984bc2b0
slice: use write! macro to get rid of GH Advanced Security lint https://github.com/jqnatividad/qsv/commit/c739097e20d526cb6f49ca69d76fed8b28adc029
sqlp: fixed docopt defaults that were not being parsed correctly https://github.com/jqnatividad/qsv/commit/fe850adb47f1d7aa7f6c3981e350646e7b0c7476
deps: bump h2 from 0.4.3 to 0.4.4 to fix HTTP2 Continuation Flood vulnerability https://github.com/jqnatividad/qsv/commit/6af0da27f4e4a0bb6d5563701c07c89ad00f76b8
deps: bump rustls from 0.22.3 to 0.22.4 to fix https://nvd.nist.gov/vuln/detail/CVE-2024-32650 https://github.com/jqnatividad/qsv/pull/1758

Removed

fetch & fetch post: remove jsonxf crate; use serde_json to prettify JSON strings https://github.com/jqnatividad/qsv/pull/1727
reverse: remove kludgy expansion of read/write buffers https://github.com/jqnatividad/qsv/commit/46095cdf57f65c5380251c5d59317053ae1f80c3

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.125.0...0.126.0

0.125.0

1 month ago

In this release, we focused on the 🏎️ need for even more speed 🏎️ .

This was done primarily by tweaking several supporting qsv crates. qsv-docopt now parses command-line arguments slightly faster. qsv-stats, the crate behind commands like stats, schema, tojsonl, and frequency, has been further optimized for speed. qsv-dateparser has been updated to support new timezone handling options in datefmt. qsv-sniffer also got a speed boost.

Per the benchmark suite, stats is 25% faster (1.563 secs vs 2.067 secs) when computing the 13 "streaming" stats and 14% faster when computing --everything (17 columns of addl stats - 3.149 secs vs 3.656 secs) for the 1M row, 41 column, 520mb sample of NYC's 311 data.

The count command has been refactored to utilize Polars' SQLContext, which leverages LazyFrames evaluation to automagically count even very large files in just a few seconds. Previously, count was already using Polars, but it mistakenly fell back to a slower counting mode. Now, it consistently delivers fast performance, even without an index. On the same benchmark suite, it takes 0.052 secs vs 0.503 seconds - almost 10x faster!

As count is not just a top-level command, but also a widely used helper used by several qsv commands, this gives the entire suite a nice performance boost.

Continuing on the performance front, the excel command now has a new short --metadata mode, allowing users to just get a "shorter" version of the metadata report that only list the workbook's top level metadata (sheet index, sheet name, sheet type, visibility) instead of the full metadata report (which also has info like num rows, column metadata, etc.). On the benchmark suite, the short metadata report takes all of 0.005 secs vs 11.237 secs for the 1M row xlsx version of the same NYC 311 data - more than 3 orders of magnitude faster! (it may actually be faster since 0.005 secs is at the limits of what hyperfine can measure)

The datefmt command also got some major enhancements with new timezone handling and timestamp parsing options, though at the cost of a small 15% performance penalty.

Lastly, we are excited to announce that qsv will be featured at the CSV,Conf,V8 conference in Puebla, Mexico on May 28-29. I'll be presenting a talk titled "qsv: A Blazing Fast CSV Data-Wrangling Toolkit". Hope to see you there!.

Added

excel: added short mode to --metadata option https://github.com/jqnatividad/qsv/pull/1699
datefmt: added ts-resolution option to specify resolution to use when parsing unix timestamps https://github.com/jqnatividad/qsv/pull/1704
datefmt: added timezone handling options https://github.com/jqnatividad/qsv/pull/1706 https://github.com/jqnatividad/qsv/pull/1707 https://github.com/jqnatividad/qsv/pull/1642

Changed

count: refactored to use Polars SQLContext https://github.com/jqnatividad/qsv/commit/43a236f6a45c890d2bb6b4c43eb469bd627f82e1
stats: refactored stats_path helper function https://github.com/jqnatividad/qsv/commit/174c30e3b87470613ff34a98617d44e477a4296a
apply, applydp, datefmt, excel, geocode, py, validate: use std::mem::take to avoid clone https://github.com/jqnatividad/qsv/commit/1fd187f23262b51e0f431664895d49fd930d011a https://github.com/jqnatividad/qsv/commit/8402d3a8063ef161fc9ec68dd7f0f0601802d21d https://github.com/jqnatividad/qsv/commit/849615775505a25888a50b255ba0d544e878aeaf
excel: optimized workbook opening operation https://github.com/jqnatividad/qsv/commit/67f662eba501e543ec44e5daf5eb175f8a8ae7b1
build(deps): bump flexi_logger from 0.27.4 to 0.28.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1673
build(deps): bump polars from 0.38.2 to 0.38.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1674
build(deps): bump uuid from 1.7.0 to 1.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1675
build(deps): bump hashbrown from 0.14.3 to 0.14.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1680
build(deps): bump reqwest from 0.11.26 to 0.11.27 by @dependabot in https://github.com/jqnatividad/qsv/pull/1679
build(deps): bump bytes from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1685
build(deps): bump regex from 1.10.3 to 1.10.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1686
build(deps): bump indexmap from 2.2.5 to 2.2.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1687
build(deps): bump rayon from 1.9.0 to 1.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1688
build(deps): bump qsv_docopt from 1.6.0 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1691
build(deps): bump reqwest from 0.12.1 to 0.12.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1693
build(deps): bump serde_json from 1.0.114 to 1.0.115 by @dependabot in https://github.com/jqnatividad/qsv/pull/1694
build(deps): bump itoa from 1.0.10 to 1.0.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1695
build(deps): bump actions/setup-python from 5.0.0 to 5.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1700
build(deps): bump rust_decimal from 1.34.3 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1701
build(deps): bump chrono from 0.4.35 to 0.4.37 by @dependabot in https://github.com/jqnatividad/qsv/pull/1702
build(deps): bump tokio from 1.36.0 to 1.37.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1703
build(deps): bump qsv-sniffer from 0.10.2 to 0.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1708
build(deps): bump titlecase from 2.2.1 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1709
build(deps): bump qsv-stats from 0.13.0 to 0.14.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1710
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands
bumped MSRV to 1.77.1
use #[cfg(debug_assertions)] conditional compilation to avoid compiling debug code in release mode
use patched forks of jsonschema, cached, self_update and localzone crates to avoid old dependencies which was causing dependency bloat

Fixed

count: fixed polars_count_input helper, as it was always falling back to "slow" counting mode https://github.com/jqnatividad/qsv/commit/3484c89080d41d2e39457c918a893189aee64753

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.124.1...0.125.0

0.124.1

2 months ago

Datapusher+ "Speed of Insight" Release! 🚀🚀🚀

This release is all about speed, speed, speed! We've made qsv even faster by leveraging Polars' multithreaded, mem-mapped CSV reader to get near-instant row counts of large CSV files, and near instant SQL queries and aggregations with Datapusher+ - automagically inferring metadata and giving you quick insights into your data in seconds!

We're demoing our qsv-powered Datapusher+ at the March 2024 installment of CKAN Montly Live on March 20, 2024, 13:00-14:00 UTC. Join us!

Beyond pushing data reliably at speed into your CKAN Datastore (it pushes real good! 😉), DP+ does some extended analysis, processing and enrichment of the data so it can be readily Used.

Both fetch and fetchpost commands now also have a --disk-cache option and are fully synched - forming the foundation for high-speed data enrichment from Web Services - including datHere's forthcoming, fully-integrated Data Enrichment Service.

🏇🏽 Hi-ho Quicksilver, away! 🏇🏽

Added

count: automatically use Polars multithreaded, mem-mapped CSV reader when polars feature is enabled to get near-instant row counts of large CSV files even without an index https://github.com/jqnatividad/qsv/pull/1656
qsvdp: added polars support to Datapusher+-optimized binary variant, so we can do near instant SQL queries and aggregations during DP+ processing https://github.com/jqnatividad/qsv/pull/1664
fetchpost: added --disk-cache options and synced usage options with fetch https://github.com/jqnatividad/qsv/pull/1671
extended .infile-list to skip empty and commented lines, and to validate file paths https://github.com/jqnatividad/qsv/commit/20a45c80fa32ef8a8060bb32cc94b7934da23229 and https://github.com/jqnatividad/qsv/commit/26509303719ce29e900cb73b5000671a78db6b4a

Changed

sqlp: automatically disable read_csv() fast path optimization when a custom delimiter is specified https://github.com/jqnatividad/qsv/pull/1648
refactored util::count_rows() helper to also use polars if available https://github.com/jqnatividad/qsv/commit/1e09e17e440d3cdc11237d9d9e45cefb82da5a42 and https://github.com/jqnatividad/qsv/commit/8d321fe8ad4c288b72edc7e8d082fcd6ec304a32
publish: updated Windows MSI publish GH Action workflow to use Wix 3.14 from 3.11 https://github.com/jqnatividad/qsv/commit/75894ef4e894f521056a93b4f0a14d7469bac022
deps: bump polars from 0.38.1 to 0.38.2 https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab
deps: update Luau from 0.614 to 0.616 https://github.com/jqnatividad/qsv/commit/eb197fe81738b4ed15352f5f89d5d5d1b0fad604 and https://github.com/jqnatividad/qsv/commit/52331da939a3cd278c6a1f474179bef2207364a8
build(deps): bump sysinfo from 0.30.6 to 0.30.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1650
build(deps): bump chrono from 0.4.34 to 0.4.35 by @dependabot in https://github.com/jqnatividad/qsv/pull/1651
build(deps): bump strum from 0.26.1 to 0.26.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1658
build(deps): bump qsv-stats from 0.12.0 to 0.13.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1663
build(deps): bump anyhow from 1.0.80 to 1.0.81 by @dependabot in https://github.com/jqnatividad/qsv/pull/1662
build(deps): bump reqwest from 0.11.25 to 0.11.26 by @dependabot in https://github.com/jqnatividad/qsv/pull/1667
applied select clippy recommendations
updated several indirect dependencies
added several benchmarks for new/changed commands

Fixed

dedup: fixed #1665 dedup not handling numeric values properly by adding a --numeric option https://github.com/jqnatividad/qsv/pull/1666
joinp: reenable join validation tests now that Polars 0.38.2 join validation is working again https://github.com/jqnatividad/qsv/commit/5faf90ed830541a724768e808c7f07f0a418e2ab and https://github.com/jqnatividad/qsv/commit/fcfc75b855c615effb50f23c09a1d66ce70505e8
count: broken in unreleased 0.124.0. Polars-powered count require a "clean" CSV file as it infers the schema based on the first 1000 rows of a CSV. This will sometimes result in an invalid "error" (e.g. it infers a column is a number column, when its not). 0.124.1 fixes this by adding a fallback to the "regular" CSV reader if a Polars error occurs https://github.com/jqnatividad/qsv/commit/a2c086900d1c1f1ba8ed2b2d1eaf8e547e3ef740

Removed

gender_guesser 0.2.0 has been released. Remove patch.crates-io entry https://github.com/jqnatividad/qsv/commit/97873a5c496bfd559d7a7804db4d28b94915d536

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.123.0...0.124.1

0.123.0

2 months ago

OPEN DATA DAY 2024 Release! 🎉🎉🎉

In celebration of Open Data Day, we're releasing qsv 0.123.0 - the biggest release ever with 330+ commits! qsv 0.123.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.

We've been baking qsv pro for a while now, and it's almost ready for release. qsv pro is a cross-platform Desktop Data Wrangling tool marrying an Excel-like UI with the power of qsv, backed by cloud-based data cleaning, enrichment and enhancement service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.

Stay tuned!

Highlights:

sqlp now has automatic read_csv() fast path optimization, often making optimized queries run dramatically faster - e.g what took 6.09 seconds for a non-trivial SQL aggregation on an 18 column, 657mb CSV with 7.43 million rows now takes just 0.14 seconds with the optimization - 🚀 43.5x FASTER 🚀 ! [^1] [^1]: measurements taken on an Apple Mac Mini 2023 model with an M2 Pro chip with 12 CPU cores & 32GB of RAM, running macOS Sonoma 14.4

# with fast path optimization turned off
/usr/bin/time qsv sqlp taxi.csv --no-optimizations "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
        6.09 real         6.82 user         0.16 sys

# with fast path optimization, fully exploiting Polars' multithreaded, mem-mapped CSV reader!
 /usr/bin/time qsv sqlp taxi.csv "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
VendorID,total_amount
1,52377417.52985942
2,89959869.13054822
4,600584.610000027
(3, 2)
        0.14 real         1.09 user         0.09 sys

# in contrast, csvq takes 72.46 seconds - 517.57x slower
/usr/bin/time csvq "select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID"
+----------+---------------------+
| VendorID |  SUM(total_amount)  |
+----------+---------------------+
| 1        |  52377417.529256366 |
| 2        |    89959869.1264675 |
| 4        |   600584.6099999828 |
+----------+---------------------+
       72.46 real        65.15 user        75.17 sys

"Traditional" SQL engines

qsv and csvq both operate on "bare" CSVs. For comparison, let's contrast qsv's performance against "traditional" SQL engines that require setup and import (aka ETL). Not counting setup and import time (which alone, takes several minutes), we get:

sqlite3.43.2 takes 2.910 seconds - 20.79x slower

sqlite> .timer on
sqlite> select VendorID,sum(total_amount) from taxi group by VendorID order by VendorID;
1,52377417.53
2,89959869.13
4,600584.61
Run Time: real 2.910 user 2.569494 sys 0.272972

PostgreSQL 15.6 using PgAdmin 4 v6.12 takes 18.527 seconds - 132.34x slower

Screenshot 2024-03-06 at 10 14 04 AM

even with an index, qsv sqlp is still 5.96x faster

sqlp now supports JSONL output format and adds compression support for Avro and Arrow output formats.
fetch now has a --disk-cache option, so you can cache web service responses to disk, complete with cache control and expiry handling!
jsonl is now multithreaded with additional --batch and --job options.
split now has three modes: split by record count, split by number of chunks and split by file size.
datefmt is a new top-level command for date formatting. We extracted it from apply to make it easier to use, and to set the stage for expanded date and timezone handling.
enum now has a --start option.
excel now has a --keep-zero-time option and now has improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24.
tojsonl now has --trim and --no-boolean options and eliminated false positive boolean inferences.

Added

apply: add gender_guess operation https://github.com/jqnatividad/qsv/pull/1569
datefmt: new top-level command for date formatting. https://github.com/jqnatividad/qsv/pull/1638
enum: add --start option https://github.com/jqnatividad/qsv/pull/1631
excel: added --keep-zero-time option; improved datetime/duration parsing/handling with upgrade of calamine from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1595
fetch: add --disk-cache option https://github.com/jqnatividad/qsv/pull/1621
jsonl: major performance refactor! Now multithreaded with addl --batch and --job options https://github.com/jqnatividad/qsv/pull/1553
sniff: added addl mimetype/file formats detected by bumping file-format from 0.23 to 0.24 https://github.com/jqnatividad/qsv/pull/1589
split: add <outdir> error handling and add usage text examples https://github.com/jqnatividad/qsv/pull/1585
split: added --chunks option https://github.com/jqnatividad/qsv/pull/1587
split: add --kb-size option https://github.com/jqnatividad/qsv/pull/1613
sqlp: added JSONL output format and compression support for AVRO and Arrow output formats in https://github.com/jqnatividad/qsv/pull/1635
tojsonl: add --trim option https://github.com/jqnatividad/qsv/pull/1554
Add QSV_DOTENV_PATH env var https://github.com/jqnatividad/qsv/pull/1562
Add license scan report and status by @fossabot in https://github.com/jqnatividad/qsv/pull/1550
Added several benchmarks for new/changed commands

Changed

luau: bumped Luau from 0.606 to 0.614
freq: major performance refactor - https://github.com/jqnatividad/qsv/commit/1a3a4b4f54f7459ce120c2bc907385ad72d34d8e
split: migrate to rayon from threadpool https://github.com/jqnatividad/qsv/pull/1555
split: refactored to actually create chunks <= desired --kb-size, obviating need for hacky --sep-factor option https://github.com/jqnatividad/qsv/pull/1615
tojsonl: improved true/false boolean inferencing false positive handling https://github.com/jqnatividad/qsv/pull/1641
tojsonl: fine-tune boolean inferencing https://github.com/jqnatividad/qsv/pull/1643
schema: use parallel sort when sorting enums for fields https://github.com/jqnatividad/qsv/commit/523c60a36bf45b4df5e66f3951a91948c22d5261
Use array for rustflags to avoid conflicts with user flags by @clarfonthey in https://github.com/jqnatividad/qsv/pull/1548
Make it easier and more consistent to package for distros by @alerque in https://github.com/jqnatividad/qsv/pull/1549
Replace simple_home_dir with simple_expand_tilde crate https://github.com/jqnatividad/qsv/pull/1578
build(deps): bump rayon from 1.8.0 to 1.8.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1547
build(deps): bump rayon from 1.8.1 to 1.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1623
build(deps): bump uuid from 1.6.1 to 1.7.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1551
build(deps): bump jql-runner from 7.1.2 to 7.1.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1552
build(deps): bump jql-runner from 7.1.3 to 7.1.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1602
build(deps): bump jql-runner from 7.1.5 to 7.1.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1637
build(deps): bump flexi_logger from 0.27.3 to 0.27.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1556
build(deps): bump regex from 1.10.2 to 1.10.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1557
build(deps): bump cached from 0.47.0 to 0.48.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1558
build(deps): bump cached from 0.48.0 to 0.48.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1560
build(deps): bump cached from 0.48.1 to 0.49.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1618
build(deps): bump chrono from 0.4.31 to 0.4.32 by @dependabot in https://github.com/jqnatividad/qsv/pull/1559
build(deps): bump chrono from 0.4.32 to 0.4.33 by @dependabot in https://github.com/jqnatividad/qsv/pull/1566
build(deps): bump mlua from 0.9.4 to 0.9.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1565
build(deps): bump mlua from 0.9.5 to 0.9.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1632
build(deps): bump serde from 1.0.195 to 1.0.196 by @dependabot in https://github.com/jqnatividad/qsv/pull/1568
build(deps): bump serde from 1.0.196 to 1.0.197 by @dependabot in https://github.com/jqnatividad/qsv/pull/1612
build(deps): bump serde_json from 1.0.111 to 1.0.112 by @dependabot in https://github.com/jqnatividad/qsv/pull/1567
build(deps): bump serde_json from 1.0.112 to 1.0.113 by @dependabot in https://github.com/jqnatividad/qsv/pull/1576
build(deps): bump serde_json from 1.0.113 to 1.0.114 by @dependabot in https://github.com/jqnatividad/qsv/pull/1610
bump Polars from 0.36 to 0.37 https://github.com/jqnatividad/qsv/pull/1570
build(deps): bump polars from 0.37.0 to 0.38.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1629
build(deps): bump polars from 0.38.0 to 0.38.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1634
build(deps): bump strum from 0.25.0 to 0.26.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1572
build(deps): bump indexmap from 2.1.0 to 2.2.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1575
build(deps): bump indexmap from 2.2.1 to 2.2.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1579
build(deps): bump indexmap from 2.2.2 to 2.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1601
build(deps): bump indexmap from 2.2.4 to 2.2.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1633
build(deps): bump robinraju/release-downloader from 1.8 to 1.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1574
build(deps): bump itertools from 0.12.0 to 0.12.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1577
build(deps): bump rust_decimal from 1.33.1 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1580
build(deps): bump rust_decimal from 1.34.0 to 1.34.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1582
build(deps): bump rust_decimal from 1.34.2 to 1.34.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1597
build(deps): bump reqwest from 0.11.23 to 0.11.24 by @dependabot in https://github.com/jqnatividad/qsv/pull/1581
build(deps): bump tokio from 1.35.1 to 1.36.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1583
build(deps): bump tempfile from 3.9.0 to 3.10.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1590
build(deps): bump tempfile from 3.10.0 to 3.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1622
build(deps): bump indicatif from 0.17.7 to 0.17.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1598
build(deps): bump csvs_convert from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1596
build(deps): bump ahash from 0.8.7 to 0.8.8 by @dependabot in https://github.com/jqnatividad/qsv/pull/1599
build(deps): bump ahash from 0.8.8 to 0.8.9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1611
build(deps): bump ahash from 0.8.9 to 0.8.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1624
build(deps): bump ahash from 0.8.10 to 0.8.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1640
build(deps): bump governor from 0.6.0 to 0.6.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1603
build(deps): bump semver from 1.0.21 to 1.0.22 by @dependabot in https://github.com/jqnatividad/qsv/pull/1606
build(deps): bump ryu from 1.0.16 to 1.0.17 by @dependabot in https://github.com/jqnatividad/qsv/pull/1605
build(deps): bump anyhow from 1.0.79 to 1.0.80 by @dependabot in https://github.com/jqnatividad/qsv/pull/1604
build(deps): bump geosuggest-core from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1607
build(deps): bump geosuggest-utils from 0.6.0 to 0.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1608
build(deps): bump pyo3 from 0.20.2 to 0.20.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1616
build(deps): bump crossbeam-channel from 0.5.11 to 0.5.12 by @dependabot in https://github.com/jqnatividad/qsv/pull/1627
build(deps): bump log from 0.4.20 to 0.4.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1628
build(deps): bump sysinfo from 0.30.5 to 0.30.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1636
build(deps): bump qsv-sniffer from 0.10.1 to 0.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1644
deps: bump halfbrown from 0.24 to 0.25 https://github.com/jqnatividad/qsv/commit/b32fc7161715fc0d3cc96b1566f89354bea36abf
apply select clippy suggestions
update several indirect dependencies
pin Rust nightly to 2024-02-23 - the nightly that Polars 0.38 can be built with

Fixed

fix: fix feature = "cargo-clippy" deprecation by @rex4539 in https://github.com/jqnatividad/qsv/pull/1626
stats: fixed cache.json file not being updated properly https://github.com/jqnatividad/qsv/commit/b9c43713b0943baf2d70eb7089e1d8f05b848b9d

Removed

Removed datefmt subcommand from apply https://github.com/jqnatividad/qsv/pull/1638

New Contributors

@clarfonthey made their first contribution in https://github.com/jqnatividad/qsv/pull/1548
@alerque made their first contribution in https://github.com/jqnatividad/qsv/pull/1549
@fossabot made their first contribution in https://github.com/jqnatividad/qsv/pull/1550
@rex4539 made their first contribution in https://github.com/jqnatividad/qsv/pull/1626

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.122.0...0.123.0

0.122.0

3 months ago

👉 REQUEST FOR USE CASES: 👈

Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.

Highlights:

qsvpy is now available in the prebuilt binaries for select platforms! It's a new qsv binary variant with the python feature, enabling the py command. Three subvariants are available - qsvpy310, qsvpy311 and qsvpy312, corresponding to Python 3.10, 3.11 and 3.12 respectively.
Removed generate command as generate's main dependency is unmaintained and has old dependencies. generate was also not used much, as the test data it generated was not well suited for training models and it was too slow so we decided to remove it even before the synthesize (#235) command is ready.
reverse now has index support and can work in "streaming" mode and handle larger than memory CSV files.
sort and sample: users can now choose from three Random Number Generator (RNG) algorithms with the --rng option - standard, faster & cryptosecure.
pseudo now has --start, --increment & --formatstr options.
fmt now has a --no-final-newline option to suppress the final newline for better interoperability with other tools, specifically Excel. It also treats "T" as special value for tab character for the --out-delimiter option.

Added

reverse: now has index support and can work in "streaming" mode https://github.com/jqnatividad/qsv/pull/1531
sort: added --rng <kind> for different kinds of RNGs - standard, faster & cryptosecure https://github.com/jqnatividad/qsv/pull/1535
sample: added --rng <kind> option (standard, faster & cryptosecure) https://github.com/jqnatividad/qsv/pull/1532
pseudo: major refactor. Added --start, --increment & --formatstr options https://github.com/jqnatividad/qsv/pull/1541
fmt: add --no-final-newline option https://github.com/jqnatividad/qsv/pull/1545
added additional benchmarks
added additional test for new options. We now have ~1,300 tests!

Changed

fmt: --out-delimiter now treats "T" as special value for tab character https://github.com/jqnatividad/qsv/pull/1546
build(deps): bump whatlang from 0.16.3 to 0.16.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1525
build(deps): bump serde_json from 1.0.110 to 1.0.111 by @dependabot in https://github.com/jqnatividad/qsv/pull/1524
build(deps): bump pyo3 from 0.20.1 to 0.20.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1526
build(deps): bump sysinfo from 0.30.3 to 0.30.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1523
build(deps): bump sysinfo from 0.30.4 to 0.30.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1530
build(deps): bump serial_test from 2.0.0 to 3.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1534
build(deps): bump mlua from 0.9.2 to 0.9.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1540
build(deps): bump mlua from 0.9.3 to 0.9.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1542
build(deps): bump simple-home-dir from 0.2.1 to 0.2.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1544
apply select clippy suggestions
update several indirect dependencies

Removed

removed generate command https://github.com/jqnatividad/qsv/pull/1527
removed generate feature from GitHub Action workflows https://github.com/jqnatividad/qsv/pull/1528
sample: removed --faster RNG sampling option, replacing it with --rng https://github.com/jqnatividad/qsv/pull/1532

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.121.0...0.122.0

0.121.0

4 months ago

Two days ago, qsv 0.120.0 was released. Hours later, significant updates occurred in our ecosystem: Polars upgraded to version 0.36, Homebrew rolled out support for Rust 1.75.0, and our pull request for 'cached' was merged.

In light of these developments, we're releasing 0.121.0 out of cycle to leverage the new features, fixes and performance enhancements in these key components integral to qsv.

👉 REQUEST FOR USE CASES: 👈 Please help define the future of qsv. Add what you're currently using qsv for here - https://github.com/jqnatividad/qsv/discussions/1529

Not only does it help us catalog what use cases we should optimize for, posters will get higher priority access to the qsv pro preview.

Added

sqlp: with Polars 0.36, it now supports:
- subqueries for JOIN and FROM (examples)
- REGEXP and RLIKE pattern matching (examples)
- common variant spelling STDEV in the SQL engine (in addition to STDDEV)
- and more under the hood improvements!
sqlp: now supports writing to Apache Avro format https://github.com/jqnatividad/qsv/commit/32f2fbb1b06dfbee4e7823521e9991a306e7eb44
sqlp: when writing to CSV --format, if the --output file has a TSV or TAB extension, it will automatically use the tab delimiter https://github.com/jqnatividad/qsv/commit/c97048cfc8c0fed01d7b32d3173be508135b9769

Changed

Bump polars from 0.35 to 0.36 https://github.com/jqnatividad/qsv/pull/1521
build(deps): bump serde from 1.0.193 to 1.0.194 by @dependabot in https://github.com/jqnatividad/qsv/pull/1520
build(deps): bump serde_json from 1.0.109 to 1.0.110 by @dependabot in https://github.com/jqnatividad/qsv/pull/1519
build(deps): bump semver from 1.0.20 to 1.0.21 by @dependabot in https://github.com/jqnatividad/qsv/pull/1518
build(deps): bump serde_stacker from 0.1.10 to 0.1.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1517
build(deps): bump cached from 0.46.1 to 0.47.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1522
bumped MSRV to 1.75.0

Fixed

cat: fixed performance regression in rowskey by moving unchanging variables out of hot loop - https://github.com/jqnatividad/qsv/commit/96a40e93b5ab09655aa90f8653014c96d3da652b
sqlp: Polars 0.36 fixed the SQL SUBSTR() function

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.120.0...0.121.0

0.120.0

4 months ago

Happy New Year! 🎉🎉🎉 Here's the first release of 2024, the biggest ever with 280+ commits! qsv 0.120.0 continues to focus on performance, stability and reliability as we continue setting the stage for qsv's big brother - qsv pro.

Apart from wrapping qsv with a User Interface, qsv pro also comes with a retinue of related cloud-based data cleaning, enrichment and enhancement services along with expanded metadata inferencing to make your Data Useful, Usable and Used!

qsv pro draws inspiration from OpenRefine, but reimagined without its file size and speed limitations, with qsv pro having the ability to process multi-gigabyte files in seconds.

It incorporates hard lessons we learned in the past 12 years deploying Data Portals and Data Pipelines to create a new Data/Metadata Wrangling and AI-assisted Data Publishing service that's easy to use for casual Excel users and Data Publishers, yet powerful enough for data scientists and data engineers.

But it's not quite ready for release yet, so stay tuned!

We're now taking signups for a preview release however, so if you're interested, please sign up here!

Excitingly, qsv was also mentioned on Hacker News in this thread Dec 23, 2023! As a result, we're now almost at 2,000+ stars on GitHub from 900 stars on Dec 22! 🎉🎉🎉

Stay tuned for more advancements in 2024 – it's set to be a landmark year for qsv! 🦄🦄🦄

Added

cat: add rowskey --group options; increased perf of rowskey https://github.com/jqnatividad/qsv/pull/1508
validate: add --trim and --quiet options https://github.com/jqnatividad/qsv/pull/1452
apply & applydp: operations regex_replace now supports empty --replacement with the "<NULL>" special value https://github.com/jqnatividad/qsv/pull/1470 and https://github.com/jqnatividad/qsv/pull/1471
exclude: also consider rows with empty fields https://github.com/jqnatividad/qsv/pull/1498
extsort: add --tmp-dir option https://github.com/jqnatividad/qsv/commit/ca1f46145cf6a06ad4401e2bf30f82415cc2ef82

Changed

validate: Faster RFC4180 validation with byterecords and SIMD-accelerated utf8 validation https://github.com/jqnatividad/qsv/pull/1440
excel: minor performance tweaks https://github.com/jqnatividad/qsv/pull/1446
apply, applydp, explode, geocode, pseudo: consolidate redundant code and use one replace_column_value helper fn in util.rs https://github.com/jqnatividad/qsv/pull/1456
excel: bump calamine from 0.22 to 0.23 https://github.com/jqnatividad/qsv/pull/1473
excel & joinp: use atoi_simd for faster &[u8] to int conversion https://github.com/jqnatividad/qsv/commit/9521f3e3fb73f600e6691188a65e19eda6cd317e
cat, describegpt, headers, sqlp, to, tojsonl: refactor commands that accept multiple input files to use improved process_input helper https://github.com/jqnatividad/qsv/pull/1496
fetch & fetchpost: get_response refactor for maintainability and performance https://github.com/jqnatividad/qsv/pull/1507
luau: replaced --no-colindex option with --colindex option. --col-index slows down processing and is not often used, so make it an option, not the default. https://github.com/jqnatividad/qsv/commit/a0c856807c47f00f531837ae124d412fca834cd2
make thousands crate optional with apply feature in https://github.com/jqnatividad/qsv/pull/1453
build(deps): bump uuid from 1.6.0 to 1.6.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1430
build(deps): bump serde from 1.0.192 to 1.0.193 by @dependabot in https://github.com/jqnatividad/qsv/pull/1432
build(deps): bump data-encoding from 2.4.0 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1435
build(deps): bump mlua from 0.9.1 to 0.9.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1436
build(deps): bump url from 2.4.1 to 2.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1437
build(deps): bump jql-runner from 7.0.6 to 7.0.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1439
build(deps): bump jql-runner from 7.0.7 to 7.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1447
build(deps): bump jql-runner from 7.1.0 to 7.1.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1457
build(deps): bump jql-runner from 7.1.1 to 7.1.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1486
build(deps): bump hashbrown from 0.14.2 to 0.14.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1441
build(deps): bump redis from 0.23.3 to 0.23.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1442
build(deps): bump redis from 0.23.3 to 0.24.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1455
build(deps): bump atoi_simd from 0.15.3 to 0.15.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1444
build(deps): bump atoi_simd from 0.15.4 to 0.15.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1445
build(deps): bump atoi_simd from 0.15.5 to 0.15.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1512
build(deps): bump actions/setup-python from 4.7.1 to 4.8.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1454
build(deps): bump actions/setup-python from 4.8.0 to 5.0.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1459
build(deps): bump actions/stale from 8 to 9 by @dependabot in https://github.com/jqnatividad/qsv/pull/1463
build(deps): bump itoa from 1.0.9 to 1.0.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1464
build(deps): bump tokio from 1.34.0 to 1.35.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1465
build(deps): bump tokio from 1.35.0 to 1.35.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1483
build(deps): bump ryu from 1.0.15 to 1.0.16 by @dependabot in https://github.com/jqnatividad/qsv/pull/1466
build(deps): bump file-format from 0.22.0 to 0.23.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1468
build(deps): bump github/codeql-action from 2 to 3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1476
build(deps): bump geosuggest-utils from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1479
build(deps): bump geosuggest-core from 0.5.1 to 0.5.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1478
build(deps): bump reqwest from 0.11.22 to 0.11.23 by @dependabot in https://github.com/jqnatividad/qsv/pull/1480
build(deps): bump calamine from 0.23.0 to 0.23.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1481
build(deps): bump qsv-sniffer from 0.10.0 to 0.10.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1484
build(deps): bump anyhow from 1.0.75 to 1.0.76 by @dependabot in https://github.com/jqnatividad/qsv/pull/1485
build(deps): bump futures from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1492
build(deps): bump futures-util from 0.3.29 to 0.3.30 by @dependabot in https://github.com/jqnatividad/qsv/pull/1491
build(deps): bump crossbeam-channel from 0.5.9 to 0.5.10 by @dependabot in https://github.com/jqnatividad/qsv/pull/1490
build(deps): bump sysinfo from 0.29.10 to 0.29.11 by @dependabot in https://github.com/jqnatividad/qsv/pull/1443
Bump sysinfo from 0.29.11 to 0.30 https://github.com/jqnatividad/qsv/pull/1489
build(deps): bump sysinfo from 0.30.0 to 0.30.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1495
build(deps): bump sysinfo from 0.30.1 to 0.30.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1504
build(deps): bump sysinfo from 0.30.2 to 0.30.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1509
build(deps): bump tabwriter from 1.3.0 to 1.4.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1500
build(deps): bump tempfile from 3.8.1 to 3.9.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1502
build(deps): bump qsv_docopt from 1.4.0 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1503
build(deps): bump ahash from 0.8.6 to 0.8.7 by @dependabot in https://github.com/jqnatividad/qsv/pull/1510
build(deps): bump serde_json from 1.0.108 to 1.0.109 by @dependabot in https://github.com/jqnatividad/qsv/pull/1511
apply select clippy suggestions
update several indirect dependencies
pin Rust nightly to 2023-12-23

Fixed

apply: Fix for dynfmt and calcconv subcommands not working in release mode https://github.com/jqnatividad/qsv/pull/1467
luau: fix check for excess mapped columns earlier. Otherwise, we'll get a CSV different field count error https://github.com/jqnatividad/qsv/commit/db1581159590205af9befaade5c047d316c9c8b3

Removed

luau: remove unneeded --jit option as we precompile luau scripts to bytecode https://github.com/jqnatividad/qsv/pull/1438

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.119.0...0.120.0

0.119.0

5 months ago

Highlights:

As we prepare for version 1.0, we're focusing on performance, stability and reliability as we set the stage for qsv pro - a cloud-backed UI version of qsv powered by Tauri, set to be released in 2024. Stay tuned!

diff is now out of beta and blazingly fast! Give "the fastest CSV-diff in the world" a try :wink:!
joinp now supports snappy automatic compression/decompression!
sqlp & joinp now recognize the QSV_COMMENT_CHAR environment variable, allowing you to skip comment lines in your input CSV files. They're also faster with the upgrade to Polars 0.35.4.
sqlp now supports subqueries, table aliases, and more!
luau: upgraded embedded Luau from 0.599 to 0.604; refactored code to reduce unneeded allocations and increase performance (more than doubling it!) as we prepare for extended recipe support.
cat is now even faster with the --flexible option. If you know your CSV files are valid, you can use this option to skip CSV validation and make cat run twice as fast!
qsv can now add a Byte Order Mark (BOM) header sequence to produce Excel-friendly CSVs with the QSV_OUTPUT_BOM environment variable.
stats, sort, schema & validate are now faster with the use of atoi_simd to directly convert &[u8] to integer, skipping unnecessary utf8 validation, while also using SIMD CPU instructions for noticeably faster performance.

Added

diff: added option/flag for headers in output by @janriemer in https://github.com/jqnatividad/qsv/pull/1395
diff: added option/flag --delimiter-output by @janriemer in https://github.com/jqnatividad/qsv/pull/1402
cat: added --flexible option to make cat rows faster still https://github.com/jqnatividad/qsv/pull/1408
sqlp & joinp: both commands now recognize QSV_COMMENT_CHAR env var https://github.com/jqnatividad/qsv/pull/1412
joinp: added snappy compression/decompression support https://github.com/jqnatividad/qsv/pull/1413
geocode: now automatically decompresses snappy-compressed index files https://github.com/jqnatividad/qsv/pull/1429
Add Byte Order Mark (BOM) output support https://github.com/jqnatividad/qsv/pull/1424
Added Codacy code quality badge https://github.com/jqnatividad/qsv/commit/99591297d59b3c45363592516d5ecb7d4d98d5c8

Changed

stats, sort, schema & validate: use atoi_simd to directly convert &[u8] to integer skipping unnecessary utf8 validation, while also using SIMD instructions for noticeably faster performance
cat: faster cat rows https://github.com/jqnatividad/qsv/pull/1407
count: optimize --width option https://github.com/jqnatividad/qsv/pull/1411
luau: upgrade embedded Luau from 0.603 to 0.604 https://github.com/jqnatividad/qsv/pull/1426
use ato_simd for fast &[u8] to int conversion https://github.com/jqnatividad/qsv/pull/1423
luau: performance refactor https://github.com/jqnatividad/qsv/commit/4cebd7c9a4b3f9f754fd2746484c24fa61ee1286
build(deps): bump csv-diff from 0.1.0-beta.4 to 0.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1394
build(deps): bump serde_json from 1.0.107 to 1.0.108 by @dependabot in https://github.com/jqnatividad/qsv/pull/1393
build(deps): bump indexmap from 2.0.2 to 2.1.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1397
build(deps): bump jql-runner from 7.0.4 to 7.0.5 by @dependabot in https://github.com/jqnatividad/qsv/pull/1399
build(deps): bump jql-runner from 7.0.5 to 7.0.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1400
build(deps): bump file-format from 0.21.0 to 0.22.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1401
build(deps): bump cached from 0.46.0 to 0.46.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1403
build(deps): bump serde from 1.0.190 to 1.0.192 by @dependabot in https://github.com/jqnatividad/qsv/pull/1404
build(deps): bump tokio from 1.33.0 to 1.34.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1409
build(deps): bump flexi_logger from 0.27.2 to 0.27.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1410
build(deps): bump qsv-stats from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1415
build(deps): bump itertools from 0.11.0 to 0.12.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1418
build(deps): bump rust_decimal from 1.33.0 to 1.33.1 by @dependabot in https://github.com/jqnatividad/qsv/pull/1420
build(deps): bump polars from 0.35.2 to 0.35.4 by @dependabot in https://github.com/jqnatividad/qsv/pull/1425
build(deps): bump uuid from 1.5.0 to 1.6.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1428
bump MSRV to 1.74.0
apply select clippy suggestions
update several indirect dependencies
pin Rust nightly to 2023-11-18

Fixed

pseudo: detect when more than one column is selected for pseudonymization https://github.com/jqnatividad/qsv/commit/0b093726bb964c2a4a6eec15c0e30ed3660fdf97
dotenv (.env) tweaks/fixes https://github.com/jqnatividad/qsv/pull/1427
fix several typos https://github.com/jqnatividad/qsv/commit/723443eed4ac0f692cdd6ac4a1af4d82e22fda8b
fix several markdown lints

Removed

remove fast-float as std float parse is now also using Eisel-Lemire algorithm https://github.com/jqnatividad/qsv/pull/1414

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.118.0...0.119.0

NOTE:

To verify prebuilt binary zip archives - click here.

0.118.0

6 months ago

Highlights:

With the Polars upgrade to 0.34.2, the sqlp and joinp enjoy expanded capabilities and a noticeable performance boost. 🦄🏇
We now publish the 500, 1000, 5000 and 15000 Geonames cities indices for the geocode command, with users able to easily switch indices with the index-load subcommand. As the name implies, the 500 index contains cities with populations of 500 or more, the 1000 index contains cities with populations of 1000 or more, and so on.
The 15000 index (default) is the smallest (13mb) and fastest with ~26k cities. The 500 index is the largest(56mb) and slowest, with ~200k cities. The 5000 index is 21mb with ~53k cities. The 1000 index is 44mb with ~140k cities. 🎠
The geocode command now returns US Census FIPS codes for US places with the %json and %pretty-json formats, returning both US State and US County FIPS codes, with upcoming support for Cities and other US Census geographies (School Districts, Voting Districts, Congressional Districts, etc.) 🎠
Improved performance for stats, schema and tojsonl commands with the stats cache bincode refactor. This is especially noticeable for large CSV files as stats previously created large bincode cache files by default.
The bincode cache allows other commands (currently, only schema and tojsonl) to skip recomputing statistics and deserialize the saved stats data structures directly into memory. Now, it will only create a bincode file if the --stats-binout option is specified (typically, before using the schema an tojsonl commands). stats will still continue to create a stats CSV cache file by default, but it will be much smaller than the bincode file, and is universally applicable, unlike the bincode cache. 🏇
self-update will now verify updates. This is done by verifying the zipsign signature of the release zip archive before applying it. This should make it harder for malicious actors to compromise the self-update process. Version 0.118.0 has the verification code, and future releases will use this new verification process. Regardless, we will zipsign all zip archives starting with this release. Users can manually verify the signatures by downloading the zipsign public key and running the zipsign command line tool. See Verifying the Integrity of the Prebuilt Binaries Zip Archive for more info. 🦄
The frequency command now supports the --ignore-case option for case-insensitive frequency counts. 🦄🎠
The schema command can now compile case-insensitive enum constraints. 🦄
Improved performance for apply and applydp commands with faster compile-time perfect hash functions for operations lookups. 🏇
Several minor performance improvements and bug fixes with snappy, sniff & cat commands. 🏇

Added

frequency: added --ignore-case option https://github.com/jqnatividad/qsv/pull/1386
geocode: added 500, 1000, 5000, 15000 Geonames cities convenience shortcuts to index subcommands https://github.com/jqnatividad/qsv/commit/bd9f4c34b0a88cc6a446872ed4cda41e8a1ca102
schema: added --ignore-case option when compiling enum constraints; replaced Hashset with faster AHashset https://github.com/jqnatividad/qsv/commit/a16a1ca25f93699a5ee27327f4257e8e559bc5e8
snappy: added buf_size parm to compress helper fn https://github.com/jqnatividad/qsv/commit/e0c0d1f7eb22917d43f638121babe23e366c9dd8
sniff added --just-mime option https://github.com/jqnatividad/qsv/pull/1372
added zipsign signature verification to self-update https://github.com/jqnatividad/qsv/pull/1389

Changed

apply & applydp: replaced binary_search with faster compile-time perfect hash functions for operations lookups https://github.com/jqnatividad/qsv/pull/1371
stats, schema and tojsonl: stats cache bincode refactor https://github.com/jqnatividad/qsv/pull/1377
luau: replaced sanitise-file-name with more popular sanitize-filename crate https://github.com/jqnatividad/qsv/commit/8927cb70bc92e9e1360547e96d1ac10e6037e9e3
cat: minor optimization by preallocating with capacity https://github.com/jqnatividad/qsv/commit/c13c34120c47bb7ab603a97a0a7cae7f0de7b146
sqlp & joinp: expanded speed/functionality with upgrade to Polars 0.34.2 https://github.com/jqnatividad/qsv/pull/1385
tojsonl: improved boolean inferencing. Now correctly infers booleans, even if the enum domain range is more than 2, but has cardinality 2 case-insensitive https://github.com/jqnatividad/qsv/commit/6345f2dc01f6451075ba7f23c35d8ba8cced9293
build(deps): bump strum_macros from 0.25.2 to 0.25.3 by @dependabot in https://github.com/jqnatividad/qsv/pull/1368
build(deps): bump regex from 1.10.1 to 1.10.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1369
build(deps): bump uuid from 1.4.1 to 1.5.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1373
build(deps): bump hashbrown from 0.14.1 to 0.14.2 by @dependabot in https://github.com/jqnatividad/qsv/pull/1376
build(deps): bump self_update from 0.38.0 to 0.39.0 by @dependabot in https://github.com/jqnatividad/qsv/pull/1378
build(deps): bump ahash from 0.8.5 to 0.8.6 by @dependabot in https://github.com/jqnatividad/qsv/pull/1383
build(deps): bump serde from 1.0.189 to 1.0.190 by @dependabot in https://github.com/jqnatividad/qsv/pull/1388
build(deps): bump futures from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1390
build(deps): bump futures-util from 0.3.28 to 0.3.29 by @dependabot in https://github.com/jqnatividad/qsv/pull/1391
build(deps): bump tempfile from 3.8.0 to 3.8.1 by @dependabot in https://github.com/jqnatividad/qsv/commit/4f6200cb57fdeb612aeb74d796b4b0c1fde7c243
apply select clippy suggestions
update several indirect dependencies
pin Rust nightly to 2023-10-26

Fixed

dedup: fixed --ignore-case not being honored during internal sort option https://github.com/jqnatividad/qsv/pull/1387
applydp: fixed wrong usage text using apply and not applydp https://github.com/jqnatividad/qsv/commit/c47ba86f305508a41e19ce39f2bd6323a0a60e1e
geocode: fixed index-update not honoring --timeout parameter https://github.com/jqnatividad/qsv/commit/3272a9e3ac75e8b8f2d9f13b0cec81a0c41c7ed4
geocode : fixed index-load to work properly with convenience shortcuts https://github.com/jqnatividad/qsv/commit/5097326ee41d39787b472b4eea95ddec76bb06b5

Full Changelog: https://github.com/jqnatividad/qsv/compare/0.117.0...0.118.0