Goawk Versions Save

A POSIX-compliant AWK interpreter written in Go, with CSV support

v1.20.0

1 year ago

New features and bug fixes (thanks @paulapatience for the reports):

Minor changes:

v1.19.0

1 year ago

Notable changes in this release:

Resolve types using topological sort to avoid O(N^2) worst case during parsing
Ignore UTF-8 BOM at start of CSV files (this is often added by the likes of Excel)
Interpret escapes in -v and var=value arguments: https://github.com/benhoyt/goawk/pull/132 and https://github.com/benhoyt/goawk/pull/134
Make regexes match multi-line strings like other AWKs
Add support for hex and hex floating point number conversions

In other news, check out awk-demo, an amazing "old skool demo" written in AWK by @patsie75. It now works under GoAWK, at least on Linux. Clone that repo and run it with awk=goawk ./demo.sh!

Thanks to @ko1nksm for several bug reports.

See full list of commits since v1.18.0.

v1.18.0

2 years ago

Relatively minor release with the following changes:

Test on Go 1.18 and update the minimum Go and CI-tested version from 1.13 to 1.14.
Wire up stdin in system() and don't buffer stdout in goawk command if stdout is a terminal so that you can now play Tetris with GoAWK. :-)
Wire up ctx through system() for ExecuteContext.
Various test and documentation improvements, including a comparison to csvkit in the CSV docs, updating CSV benchmarks to avoid huge.csv, and adding explicit tests using ASCII and Unicode unit and record separators.
Repo cleanup: moved the awkgo directory and code to a branch and removed it in the master branch; removed the unnecessary examples directory.

See the list of commits.

v1.17.1

2 years ago

Minor test fixes, no change in functionality:

v1.17.0

2 years ago

Now with proper CSV input and output support! For example, a simple example showing CSV input parsing and the new @"named-field" syntax:

$ goawk -i csv -H '{ print @"Abbreviation" }' testdata/csv/states.csv
AL
AK
AZ
...

Read the full documentation.

This feature was sponsored by the library of the University of Antwerp -- many thanks!

v1.16.0

2 years ago

Add interp.New ... Execute API to speed up and reduce allocations when executing the same program multiple times. https://github.com/benhoyt/goawk/pull/100
Add ExecuteContext API to support timeout and cancellation. https://github.com/benhoyt/goawk/pull/103
Optimized string concatenation when concatenating more than two strings, for example x = "a" "," "b". https://github.com/benhoyt/goawk/pull/99
Reduce allocations in a few other places, such as print, printf, sprintf(), and field parsing. https://github.com/benhoyt/goawk/pull/102
Add proper Go 1.18 fuzzing support for fuzzing the AWK source and input. https://github.com/benhoyt/goawk/pull/103

v1.15.0

2 years ago

This release adds no new features. It's a significant performance improvement due to switching the internals of the interpreter from a tree-walking interpreter to a bytecode compiler with a virtual machine interpreter.

Results show that it's 18% faster overall on microbenchmarks, 13% on more real-world benchmarks. It should be fully backwards compatible -- please file an issue if you find a regression!

Read the details here.

v1.14.0

2 years ago

This reverts the feature from v1.11.0 which changed the builtin functions length, substr, index, and match to use character indexes instead of byte indexes (as per the POSIX spec). The reason is because it changed those functions from O(1) to O(N), which created "accidentally quadratic" behavior in scripts that expected these functions to be O(1).

For example, @xonixx's grok.awk script on a relatively large JSON input file took about 1s in bytes mode (goawk -b), but 8 minutes (!) in the new unicode char default mode. That's extremely problematic.

Like v1.11.0, this release is again a small breaking change, but once again shouldn't affect many scripts (it will again only affect scripts that use constant indexes for substr on non-ASCII strings). I hope not many people are using interp.Config.Bytes or the goawk -b option yet, as those are gone again. Seeing v1.11.0 was only introduced a few weeks ago, I think it's worth the breakage for a performance problem of this magnitude.

Fixes https://github.com/benhoyt/goawk/issues/93: "Major speed regression for gron.awk in goawk 1.11.0+".

v1.13.0

2 years ago

Support RS being multiple characters and regular expressions RS (#86), allowing significantly more powerful text processing. This is a Gawk extension to POSIX, which says, "If RS contains more than one character, the results are unspecified."

v1.12.0

2 years ago

This release adds support for "getline lvalue" forms. See #85.