Jtc Versions Save

JSON processing utility

LatestBuild

3 years ago

Holding latest builds (the latest build: October 5, 2020)

Changes up till now:

issues #16, #17, #18: no functional impact, code safety improvements
compiling issues #19, #20
issue #21: fixed an occasional uncaught exception might be thrown in peculiar walks (UT'd)
issue #22: fixed a nasty performance regression noticeable on big JSONs for lexemes supporting interpolation: <..>R, <..>L, <..>D, <..>j (UT'd)
fix for the generated auto-tokens issue introduced in the prior build (UT'd)
fixed issues #27, #28 (affecting Linux only)
fixed issues #29, #32, improved per-walk template behavior #31
fixed/improved template-argument behavior in options -u/-i/-c: the behavior should match the behavior of -T option (string-interpolation of iterables might have produced different results)
fixed a crash potentially might be occurring when a JSON root undergoes interpolation
fixed an issue (#33), where a non-initial Regex lexeme might not be getting engaged (that's the regression from v1.76)
added template auto-token $wuid which refers to deterministic walk's unique id for each walk given by user (handy for making JSON elements collections per each walk)
introduced flow-control for walk loops using <>f .. ><f pairs: this is a use-case to resolve recursive lookup chains
improvements:
- improved namespace passing between option chain-sets and for -p/ -s options
- improved trailing backslash parsing in all lexemes
- enabled walks over a templated argument in -i / -u / -c options (as well enabled namespaces passing to the template)
- reinstated namespace passing between interleaved walks (it's a regression - the functionality was lost after re-designing namespaces in v1.76)
- improved behavior for lexeme <..>I when initializing namespace
- enhanced performance for tokens {{}}, {{..}} (when tokens used as standalone then no interpolation is needed and JSON can be retrieved directly from walks)
- added a use-case for label interpolations when string-interpolating iterables (handy for generating headers from labels/indices for CSV output), e.g.: <<<'{"tbl":["a","b","c"]}' jtc -w'[tbl]<:>k' -qqT'"{}"' will generate 0, 1, 2 output (instead of a b c) - that is predicated by last walked <:>k directive (lexeme spelling in this case is limited to :).
- improved label ordering in JSON objects: now numerical labels (those made of digits only) are ordered numerically, while all other labels ordered literally

items to accomplish before the next release:

introduced $@ auto-REGEX namespace - it holds all the RE matches (entire matches, or group matches) in a JSON array. That way it's easy to split strings, e.g.: <<<'"abc, def, ghi"' jtc -w'<[^, ]+>R' -rT'{{$@}}' produces output: [ "abc", "def", "ghi" ]
redesign and enhance internal template-interpolate logic: currently all interpolations are done via JSON serialization / deserialization, which is a slow way - rework Json class to allow parsing templates and rewrite interpolation so that it's done via binary construction (serdes way will remain only for string interpolations).
introduce couple variants of the <..>v directive:
- <var:<JSN/TMP>>v1 allow saving a JSON spelled literally, or out of a template right into a namespace (currently any lexeme value in <..>v directive is either a JSON or promoted to a JSON string)
- <var:[{{$PATH}}, <JSON/TMP>]>v2 - the JSON in this form allows reconstructing a JSON in a namespace (i.e., incrementally build up a JSON in the namespace)
implement streamed parsing of JSON (i.e. in the format similar produced by this walk: jtc -rw'<>e:' -T'[{{$PATH}}, {{}}]' - this would allow processing a virtually endless JSONs w/o any memory pressure. (parsing of such streamed JSON will be done in a concurrent thread)

1.76

4 years ago

Release Notes for jtc v.1.76

New features and enhancements:

when multiple files given, jtc now will read/parse all files concurrently (on multi-core cpu); to disable multithreading (and process files sequentially) give option -a (normally, the option is implied and redundant when multiple files given)
a new lexeme directive <..>S - complements directive <..>W: walks JSON tree as per the preserved path
when file argument for options -i/-u/-c contains a stream of JSONs, it's automatically converted into an array of JSONs
template operations enhancements:
- an argument for options -i/-u/-c now additionally can hold a template (e.g.: -u0 -T<template> now could be collapsed into -u<template>)
- regex search lexemes (<..>R, <..>L, <..>D) now are subjected to template interpolation as well, though namespace usage in such lexemes is limited to alphabetical names only ('cause numeric names would clash with regex quantifiers) - template interpolation obviously occurs before regex applied
- auto-generated label tokens for template interpolation ($A, $B, etc) now also hold indices if the respected values are in array (it used to work only for objects)
- walked atomic values now also can be represented in templates using auto-generated tokens ($A and $a for a label/index and a value respectively) for easier template-interpolation operations
- setting namespace $? to any value (even empty one) in a walk triggers resetting of the respective auto-token $? (which holds historical values) to the default value "" (it's a user-controlled way to reset the token, in addition to the existing trigger - template interpolation failure)
- when string-interpolating an iterable (array or object) via "{}" token, all atomic values within the iterable get interpolated into the string recursively
- improved template stringification (>{{}}<) - operation now is consistent across all JSON types (null / bool / numeric used to behave differently)
- limited usage of auto-generated tokens (e.g.: $abc) to 3 letters only (to avoid clashing with tokens like $file and all future tokens) - the use case for auto-generated token is template-interpolation for relatively short arrays / objects, thus 3 letters is sufficient to address iterables up to 18278 values in size)
- extended range of auto-tokens representation in iterables ($a, $b, etc): initially each token represents a a respective top level JSON element of the iterable, beyond that range each next auto-token will represent an atomic value of the JSON tree as if it walked recursively

Improvements, changes, fixes:

behavior improvements:
- redesigned and improved processing of options chain-sets logic: lifted a caveat of using -J/-j/-a in intermediate chain-sets (now it works inline with the expected option behavior in any of the option sets)
- when unquoting strings with -qq a translation of UTF-8 code points (e.g.: \uD123), as well as correct processing of UTF-8 surrogate pairs added
- improved label update operations: now also any atomic value (null / boolean / numeric) can update a label (before labels could be updated only with string types)
- improved namespace behavior for -p/-s operations (now namespaces from the respective walks are not lost in such operations and could be reused later)
performance improvements:
- redesigned and improved namespaces storage policy so that it does not slow down walks (used to be the case, noticeable when storing big JSONs)
- optimized performance for -e with -i/-u shell executions, where all such walks are attempted to be executed in a single run (popen session), otherwise defaulted to a legacy (slower) way (to enforce the legacy way give -ee)
code design improvements:
- added compile options:
  - -DBG_dTS (effective only in junction with -DBG_mTS or -DBG_uTS) - debug timestamp display delta instead of absolute stamps (handy for cpu profiling)
  - -DNDBG_PARSER: disables parsing debugs - handy when deep debugging huge JSONs (to skip the parsing part)
- speed up template interpolations (by breaking away from catching JSON parsing exceptions towards processing parsing by return value)
- improved performance when outputting walked elements (-w)
- improved debug outputs when displaying JSONs longer than the term width (the same update ensures correct displaying of UTF-8 strings)
various fixes:
- fixed locality of <>q, <>Q searches: it accidentally became global after last redesign of lexeme implementation, now it's local to the search tree (UT'ed)
- fixed accidentally broken options translation in the built-in mini-guide (-g)
- fixed a rogue debug level when debugging -e option
- fixed a very corner crash occurring upon -u/-i based source walks predicated -pp option usage and only when resulted walks gets invalidated by any of the prior walks (UT'ed of course)
- fixed an issue when last walk control (-x0 or -x/-1) worked in the first JSON but did not work in any subsequent -if there were multiple (UT'ed)

1.75d

4 years ago

Release Notes for jtc v.1.75d

New features:

performance improvements and some more fixes:

Improvements, changes, fixes:

completely reworked the logic of <>g, <>G, <>q, <>Q lexemes by externalizing their storages into standalone caches, that made them run as fast as a bare metal sort and not slowing down walking
removed some superfluous optimization in the interpolation logic (it was limiting some corner use-cases)
parsed quoted solidus (\/) now always translated into a unquoted (/), unless -q is given which restricts behavior to quoted-only
option -nn does not engulf -n now (i.e. if both behaviors required then both to be spelled: -nnn)
added a token $file holding the name of a currently processed input file - so that it could be interpolated if required
improved $PATH token interpolation so that the namespace $# also could be utilized with it (upon interpolation into a string template)
reinstated -mm behavior (advertised in the last version but missed)
fixed engagement of lexeme <..>u in interim options sets
fixed quite a rare misbehavior of branching lexemes <>f ... <>F

1.75c

4 years ago

Release Notes for jtc v.1.75c

New features:

No new features, some more minor improvements and fixes:

Improvements, changes, fixes:

for all iterables undergoing template interpolation generate auto-tokens $a, $b, etc (and $A, $B, etc) for all values (and for objects' respective labels)
for lexemes setting JSON in the namespace, e.g.: <ns:..>v if parsing JSON value fails - try promoting it to JSON string first, and only if it fails too then throw an exception
made options -z, -zz non-transient (i.e. to be used only in the final options set)
some code fixes for MacPorts compatibility
fixed issue: interpolation of $? token should work even w/o -x0 (-x/-1) option

1.75b

4 years ago

Release Notes for jtc v.1.75b

New features:

Quick fixes for overlooks in a design of the new features, which sneaked past UT:

Improvements, changes, fixes:

fixed issue: when shell evaluation fails, it might break options -ei / -eu logic
fixed/improved handling of ; char in shell eval operations -ei, -eu: treat only a standalone occurrence of \; as terminating symbol (and not when it's a trailing character - to allow cli chaining in argument)
fixed issue: all non-transient output view options -qq, -r, -t and -f, plus a bare qualifier - - should be ignored in all the interim chain sets, but the last one (except the bare qualifier - - it has a global scope, i.e. cited in any of chained option set will force initial reading from stdin )
fixed issue: accidentally broken bare qualifier - (input redirect)
fixed issue: -f option for chained sets, also extended -f: now it forces any output to file, allowing redirecting even walks
option -z now outputs size in a JSON compatible format, e.g.: { "size": 100 }

1.75a

4 years ago

Release Notes for jtc v.1.75

New features:

introduced a new semi-compact printing view. The view is engaged when the suffix c is appended to -t option, e.g.:-t2c, -tc. The semi-compact view is a middle ground between compact (-r) and pretty-printed (-t, default) views: when a JSON iterable is made of atomic values only (and/or empty iterables {}, []), it will be printed in a compact (one-line) format, the rest is pretty-printed
introduced operations chaining via delimiter /:
- chaining delimiter(s) pretty much replaces jtc ... | jtc ... | jtc ... notation with jtc ... / ... / ... - the advantage is huge: jtc now is capable of processing multiple chained operations w/o printing-parsing interim JSONs (which is quite expensive operation) - that speeds up operations and simplifies notation Another benefit is that it becomes possible to pass namespace(s) from one chain set into another (which is impossible with piping notation)
- chain-delimiter / only splits options notations, not working when cited among file arguments
introduced an optional step notation in range subscripts and search lexemes qualifiers: [N:M:S], <..>N:M:S: S must be strictly positive value. In search quantifiers <..>::{S} if after interpolation the value happens to be negative (or zero) then the default step 1 is applied
new search lexemes <..>g and <..>G allow going over JSON elements in a sorted order (ascending and descending respectively). When applied w/o quantifiers allow finding min and max values respectively
a new directive <..>Z - preserves into a namespace a selected (walked) JSON entry size (a recursive and non-recursive behaviors applied respectively). <..>Z1 lexeme (i.e., with quantifier 1) - saves into a namespace a currently walked JSON string size (if the walked JSON is not a string, the value -1 is saved)
a new lexeme <..>W - preserves a current walk-path (as a JSON array) into a namespace variable
introduced a new parsing behavior (-mm) allowing accepting ill-formed JSONs with clashing labels by collecting them into arrays (e.g.: { "a": 1, "a": 2 } will be parsed into { "a": [ 1, 2] }
rebranded jtc into JSON transformational chains to reflect better tool's purpose and capability

Improvements, changes, fixes:

enhanced template interpolation (-T...):
- removed prior limitations: now, application of templates is universal to all operations - executed as a last step for the respective walk(s)
- extended template-interpolations of JSON iterables into strings: the former could be interpolated into the string values as enumerations: the enumeration separator value (default ", ") will be taken from newly introduced namespace $#
new namespaces added:
- $#: holds the separator used when a JSON iterable is interpolated into a JSON string (default value ", ")
- $_: holds the separator used when $path is interpolated to join path tokens (default value "_")
- $$?: holds the separator used upon template expansion when interpolation token {$?} is used (default value ",")
introduced quantifiers for F directive (both recursive and non-recursive):
- a new semantic for <>Fn quantifier: if n > 0 (i.e. non-default), it will let continue walking past <>Fn directive skipping to nth lexeme from F: e.g.: <>F1 - will continue walking right from the immediately following lexeme, <>F2 will continue walking from 2nd lexeme past <>Fn (i.e., skipping the first one), etc.
- a new semantic for ><Fn quantifier: if n > 0 (i.e. non-default), it allows additional replications of the entire walk (before the lexeme ><F) n times
enhanced <..>I directive behavior:
- initialization of the namespace value could be done now within the lexeme itself, e.g.: <c:100>I1 - will initialize counter c with the value 100 before the directive executes (unlike typical behavior where namespace initialization/preservation is applied as the last step end of lexeme walking)
- a new additional semantic for <..>In:m quantifier, where n is an increment step (as before, no changes here), m - is a new multiplier (integer only), e.g.: <a:10>I5:2, after the first walking the namespace a will hold (10 + 5) * 2 = 30 - in such notation, first the increment is applied and then the multiplier
- the directive also understands now an empty token {} for the increment and/or the multiplier : <..>I{}:{} - the empty token will will refer to the currently walked (numeric) value - this is the only lexeme where such empty token notion makes sense and supported
improved -jj option behavior: now the clashing labels will override each other (thus, only the last value will be retained), to collect even clashing labels (into an array) use -m modifier
improved behavior of -ll toggle - now it gleans all the labels, not just the first one (as before) - typically used together with -j
performance improvements:
- in the JSON library, for ARY/OBJ declarations stepped away from std::initializer_list to variadic templated arguments (that permits use of move semantic now in the initialization notations, which simplified the usage and improved performance
- improved performance of buffered read from <stdin> (now, it's almost as fast as the read from files)
- same way improved performance of file read in options (-i, -u, -c)
- drastically improved performance of <>q, <>Q searches by making them cacheable: they are still quite memory hungry, still are the slowest among all searches, but now they are not prone to exponential decay and can be used on big JSONs with a predictable processing time
added a few compilation options:
- -DBG_FLOW: a new debug of the execution flows (tracing an entry and exit point of every DEBUGGABLE function/method). Add -DBG_FLOW flag when compiling to effectuate such debugging - complements nicely -DBG_CC flag when debugging copy-constructors for optimization
- -DBG_mTS: lets debugging output to have time-stamp with milliseconds accuracy
- -DBG_uTS: lets debugging output to have time-stamp with microseconds accuracy
debugability improvements:
- added printing backtrace in the unlikely event of a crash (only when debug is enabled). On MacOs/BSD it will print demangled back-tracing
- improved parsing output when debugged - now it'll be auto-adjusted to terminal's width
program design improvements:
- simplified program design for all cases of source/destination walks - that also fixed the prior caveat with labels updates through the shell evaluation (now even nested labels could be updated, the caveat is removed)
- enhanced a logical way of handlings for all directives where applicable - now, the directive is activated only once per a walk pass (applied to directives z, Z, W, v, k, I)
- improved/fixed behavior for shell evaluation (-e with -i/-u) argument parsing behavior for Linux/GNU only (Macos/BSD were fine - getopt() GNU implementation works differently than MacOS/BSD's)
more fixes and enhancements:
- fixed issue: directives <..>I and <..>u also must support interpolated name-spaced quantifiers: <..>I{ns}, <...>u{ns}
- fixed issue: fail-safe <>f directive should not fire after there have been successful matches in iterables
- fixed/improved parsing of <..>j search lexeme when the content is a template
- fixed Linux options parsing (to behave the same way like on Macos/BSD)
- fixed a corner crash when move semantic applied on multiple walks and the prior walk deletes the object pointed by the subsequent, interleaved walk
- fixed a corner crash when a search lexeme (i,o,c, etc) was matching a root iterable (array/object) and at the same time attempted saving it into a namespace
- fixed a crash when blank (or white space only) input was combined with the streamed read (-a)

standard.json

4 years ago

This is a sample JSON used in performance testings (the JSON was generated from the XML file)

1.74

4 years ago

Release Notes for jtc v.1.74

New features:

No new features, some enhancements and stability improvements

Improvements, changes, fixes:

improved handling of <>q and <>Q lexemes drastically (performance and memory utilization-wise), also now those lexemes may be empty (before it was mandatory to give a namespace in the lexemes)
option -t now can be used to control spacing for the compact (one-row) view, e.g.: -r -t0 will print a very compact one-liner JSON, w/o spaces; when used together (-r and -t), it will also control spacing in stringification of JSON in template operations
introduced a support for flags in Regular Expressions (namely: INOCESXAGP); flags can be given only as trailing part of the RE (they will be removed from the RE itself after parsing), e.g.: <...\I\O>R:; also, flags ESXAGP facilitate various REGEX grammars, those flags will be processed only once (i.e., only the first setup grammar flag will have an effect, all subsequent will be ignored)
enhanced behavior of empty <>k lexeme - now it also has an effect when placed in front ><F lexeme (i.e. logical end of walking), not only at the syntactical end of the walk-path
enhanced interpolation behavior of {} token: when interpolation of a JSON object fails, it will be re-attempted to strip the JSON object as an array - effectively allowing conversion of JSON objects into JSON arrays in templates.
fixed an issue when a "move" - semantic (-p) applied to update (-u)/insert (-i) operations: if the walks of the latter fails entirely then a purge should not be applied on destination walks (UT'ed)

1.73

4 years ago

Release Notes for jtc v.1.73

New features:

No new features, some enhancements and stability improvements

Improvements, changes, fixes:

lifted label update operation when -u is used to update a label (when a walk-path is ending with an empty ...<>k lexeme): now it's possible to update/rewrite recursively even nested labels w/o failures
converted walking (walk iteration) to a non-recursive loop, now walks are virtually endless (i.e. able to walk JSONs of virtually ANY size and depth) and not restricted by a depth of a stack
-T processing for -i<walk> and -u<walk> operations is enhanced to match the same behavior as for -w<walk>: templates are interpolated per walks now (if a count of templates and walks matches), or round-robin fashion otherwise (before, for some weird reasons all templates were applied for each such walk)
fixed insertion (-i) when the last lexeme of a walk is non-empty <..>k then no label reinterpretation occurs (so it's consistent now with the same behavior of -u)
removed support for the empty <>z notation form of the lexeme: erasing entire namespace is idiomatically inconsistent with the walk design (and might lead to confusion or misunderstanding of the expected behavior), so only non-empty lexemes <..>z are supported now (and restricted to)
fixed a crash when debugging is on (quite a corner case though)
fixed a programmatic error (rarely occurs only in API calls) where Json class would falsely expect <stdin> in the event when parsing constructor throws

1.72a

4 years ago

Release Notes for jtc v.1.72a (NOTE: The Release is republished, as prior binaries were incorrect ones)

New features:

introduced a new directive I which let incrementing/decrementing numerical JSONs preserved in the namespace (and ignore other JSON types), e.g.: <var>I3, <var>I-1. If var wasn't defined before, the iteration begins with 0; however, it's possible to initialize it with other than 0 values - see User Guide
introduce an auto-namespace variable $? to reference the last processed walk, this facilitates use-cases when converting input JSON to .csv format; see User Guide for more
introduced new lexemes <..>P, <..>N to match any JSON strings and JSON numerical types respectively. Before, to facilitate the same, REGEX lexemes were used: <.*>R and <.*>D respectively, but new lexemes work faster and allow storing matched values in the namespace)
Template-interpolation was enhanced with new capability to jsonize JSON strings (containing embedded JSONs) and stringify JSONs - similar to respective options -qq and -rr but now programmatically. See User Guide for the syntax and examples
added a new semantic to -x option: -xN[/M] notation lets specifying a frequency of walks to be displayed - (every Nth walk) staring from the optional offset M (zero based); e.g.: -x4 - display every 4th walk, while -x4/1 will do the same starting from the 2nd (index is zero based) walk. Also, note a special notation case: -x0/N - will display Nth (zero based) walk only once, this could be abbreviated to -x/N; N is positive, but also supported -1 value - to display the last walk

Improvements, changes, fixes:

improved -jl options combination behavior: in some cases it wasn't robust and failed providing the expected result. Plus, introduced a new merge format: -jlnn - all clashing values will be aggregated (disrespecting JSON structured grouping vs. as in the case of -jl)
lifted handling of atomic JSONs - simplified the code allow applying walk-paths now even onto the atomic JSON values
extended null-interpolation for JSON strings: before it was applied for JSON arrays and JSON objects only). Now, the empty variable interpolation in the string, following either of ,, ; will be taken into account, e.g.: -T'"{}, "' - if {} is empty, then result of interpolation will be empty too: ""
improved buffered file read speed (3 times faster) and stdin buffered speed (1.5-2 times faster), improved handling of non-existent/bad file-arguments (when multiple given)
enhanced move semantic of -u, -i operations, so that when used together with -pp it also works as expected with those options (before it was only working for -p and -pp notation was ignored)
a last in the walk k lexeme (e.g.: -w'... <>k') is only subjected for re-interpretation of the label if it's empty (<>k) now; the non-empty <..>k lexeme then doesn't (the value is preserved in the namespace, so no need to re-interpret it then)
template pertain per walk feature used to work only for interleaved walks and -n was cancelling it (template then were applied round-robin). Now, even with -n (i.e., for sequenced walk processing) templates are also pertained per walks, in the unlikely event when round robin behavior is required -nn notation will support it. Also, template pertain per walk feature is enhanced to be engaged only when number of templates (-T) matches the number of provided walks (-w), otherwise round-robin template application behavior is engaged.
put a hard cutoff on a too deep recursion shall any unforeseen case (while walking) occurs in the future; the same enhancement has fixed a case of too deep recursion (with subsequent stack overflow) for a corner case of lexeme <>F usage occurring in processing really huge JSONs only
the message "notice: option -J cancels streaming input" is printed now only when -a + <stdin> were selected explicitly together with -J (and not when -a is implicitly imposed upon -J)
fixed parsing debug (offset for a streamed read now shows a correct value - from the beginning of a stream, instead of the beginning of an internal circular buffer, other read debugs (buffered, cin) are unaffected)
fixed empty <>b lexeme - it was not working (as documented), now it matches any boolean JSON value (UT'ed)