A POSIX-compliant AWK interpreter written in Go, with CSV support
New features and bug fixes (thanks @paulapatience for the reports):
Minor changes:
Notable changes in this release:
In other news, check out awk-demo, an amazing "old skool demo" written in AWK by @patsie75. It now works under GoAWK, at least on Linux. Clone that repo and run it with awk=goawk ./demo.sh
!
Thanks to @ko1nksm for several bug reports.
Relatively minor release with the following changes:
ExecuteContext
.huge.csv
, and adding explicit tests using ASCII and Unicode unit and record separators.awkgo
directory and code to a branch and removed it in the master branch; removed the unnecessary examples
directory.Minor test fixes, no change in functionality:
Now with proper CSV input and output support! For example, a simple example showing CSV input parsing and the new @"named-field"
syntax:
$ goawk -i csv -H '{ print @"Abbreviation" }' testdata/csv/states.csv
AL
AK
AZ
...
This feature was sponsored by the library of the University of Antwerp -- many thanks!
interp.New
... Execute
API to speed up and reduce allocations when executing the same program multiple times. https://github.com/benhoyt/goawk/pull/100
ExecuteContext
API to support timeout and cancellation. https://github.com/benhoyt/goawk/pull/103
x = "a" "," "b"
. https://github.com/benhoyt/goawk/pull/99
print
, printf
, sprintf()
, and field parsing. https://github.com/benhoyt/goawk/pull/102
This release adds no new features. It's a significant performance improvement due to switching the internals of the interpreter from a tree-walking interpreter to a bytecode compiler with a virtual machine interpreter.
Results show that it's 18% faster overall on microbenchmarks, 13% on more real-world benchmarks. It should be fully backwards compatible -- please file an issue if you find a regression!
This reverts the feature from v1.11.0 which changed the builtin functions length
, substr
, index
, and match
to use character indexes instead of byte indexes (as per the POSIX spec). The reason is because it changed those functions from O(1) to O(N), which created "accidentally quadratic" behavior in scripts that expected these functions to be O(1).
For example, @xonixx's grok.awk script on a relatively large JSON input file took about 1s in bytes mode (goawk -b
), but 8 minutes (!) in the new unicode char default mode. That's extremely problematic.
Like v1.11.0, this release is again a small breaking change, but once again shouldn't affect many scripts (it will again only affect scripts that use constant indexes for substr on non-ASCII strings). I hope not many people are using interp.Config.Bytes
or the goawk -b
option yet, as those are gone again. Seeing v1.11.0 was only introduced a few weeks ago, I think it's worth the breakage for a performance problem of this magnitude.
Fixes https://github.com/benhoyt/goawk/issues/93: "Major speed regression for gron.awk in goawk 1.11.0+".
Support RS being multiple characters and regular expressions RS (#86), allowing significantly more powerful text processing. This is a Gawk extension to POSIX, which says, "If RS contains more than one character, the results are unspecified."
This release adds support for "getline lvalue" forms. See #85.