Jtc Versions Save

JSON processing utility

LatestBuild

3 years ago

Holding latest builds (the latest build: October 5, 2020)

Changes up till now:

  • issues #16, #17, #18: no functional impact, code safety improvements
  • compiling issues #19, #20
  • issue #21: fixed an occasional uncaught exception might be thrown in peculiar walks (UT'd)
  • issue #22: fixed a nasty performance regression noticeable on big JSONs for lexemes supporting interpolation: <..>R, <..>L, <..>D, <..>j (UT'd)
  • fix for the generated auto-tokens issue introduced in the prior build (UT'd)
  • fixed issues #27, #28 (affecting Linux only)
  • fixed issues #29, #32, improved per-walk template behavior #31
  • fixed/improved template-argument behavior in options -u/-i/-c: the behavior should match the behavior of -T option (string-interpolation of iterables might have produced different results)
  • fixed a crash potentially might be occurring when a JSON root undergoes interpolation
  • fixed an issue (#33), where a non-initial Regex lexeme might not be getting engaged (that's the regression from v1.76)
  • added template auto-token $wuid which refers to deterministic walk's unique id for each walk given by user (handy for making JSON elements collections per each walk)
  • introduced flow-control for walk loops using <>f .. ><f pairs: this is a use-case to resolve recursive lookup chains
  • improvements:
    • improved namespace passing between option chain-sets and for -p/ -s options
    • improved trailing backslash parsing in all lexemes
    • enabled walks over a templated argument in -i / -u / -c options (as well enabled namespaces passing to the template)
    • reinstated namespace passing between interleaved walks (it's a regression - the functionality was lost after re-designing namespaces in v1.76)
    • improved behavior for lexeme <..>I when initializing namespace
    • enhanced performance for tokens {{}}, {{..}} (when tokens used as standalone then no interpolation is needed and JSON can be retrieved directly from walks)
    • added a use-case for label interpolations when string-interpolating iterables (handy for generating headers from labels/indices for CSV output), e.g.: <<<'{"tbl":["a","b","c"]}' jtc -w'[tbl]<:>k' -qqT'"{}"' will generate 0, 1, 2 output (instead of a b c) - that is predicated by last walked <:>k directive (lexeme spelling in this case is limited to :).
    • improved label ordering in JSON objects: now numerical labels (those made of digits only) are ordered numerically, while all other labels ordered literally

items to accomplish before the next release:

  • introduced $@ auto-REGEX namespace - it holds all the RE matches (entire matches, or group matches) in a JSON array. That way it's easy to split strings, e.g.: <<<'"abc, def, ghi"' jtc -w'<[^, ]+>R' -rT'{{$@}}' produces output: [ "abc", "def", "ghi" ]
  • redesign and enhance internal template-interpolate logic: currently all interpolations are done via JSON serialization / deserialization, which is a slow way - rework Json class to allow parsing templates and rewrite interpolation so that it's done via binary construction (serdes way will remain only for string interpolations).
  • introduce couple variants of the <..>v directive:
    • <var:<JSN/TMP>>v1 allow saving a JSON spelled literally, or out of a template right into a namespace (currently any lexeme value in <..>v directive is either a JSON or promoted to a JSON string)
    • <var:[{{$PATH}}, <JSON/TMP>]>v2 - the JSON in this form allows reconstructing a JSON in a namespace (i.e., incrementally build up a JSON in the namespace)
  • implement streamed parsing of JSON (i.e. in the format similar produced by this walk: jtc -rw'<>e:' -T'[{{$PATH}}, {{}}]' - this would allow processing a virtually endless JSONs w/o any memory pressure. (parsing of such streamed JSON will be done in a concurrent thread)

1.76

4 years ago

Release Notes for jtc v.1.76

New features and enhancements:

  • when multiple files given, jtc now will read/parse all files concurrently (on multi-core cpu); to disable multithreading (and process files sequentially) give option -a (normally, the option is implied and redundant when multiple files given)
  • a new lexeme directive <..>S - complements directive <..>W: walks JSON tree as per the preserved path
  • when file argument for options -i/-u/-c contains a stream of JSONs, it's automatically converted into an array of JSONs
  • template operations enhancements:
    • an argument for options -i/-u/-c now additionally can hold a template (e.g.: -u0 -T<template> now could be collapsed into -u<template>)
    • regex search lexemes (<..>R, <..>L, <..>D) now are subjected to template interpolation as well, though namespace usage in such lexemes is limited to alphabetical names only ('cause numeric names would clash with regex quantifiers) - template interpolation obviously occurs before regex applied
    • auto-generated label tokens for template interpolation ($A, $B, etc) now also hold indices if the respected values are in array (it used to work only for objects)
    • walked atomic values now also can be represented in templates using auto-generated tokens ($A and $a for a label/index and a value respectively) for easier template-interpolation operations
    • setting namespace $? to any value (even empty one) in a walk triggers resetting of the respective auto-token $? (which holds historical values) to the default value "" (it's a user-controlled way to reset the token, in addition to the existing trigger - template interpolation failure)
    • when string-interpolating an iterable (array or object) via "{}" token, all atomic values within the iterable get interpolated into the string recursively
    • improved template stringification (>{{}}<) - operation now is consistent across all JSON types (null / bool / numeric used to behave differently)
    • limited usage of auto-generated tokens (e.g.: $abc) to 3 letters only (to avoid clashing with tokens like $file and all future tokens) - the use case for auto-generated token is template-interpolation for relatively short arrays / objects, thus 3 letters is sufficient to address iterables up to 18278 values in size)
    • extended range of auto-tokens representation in iterables ($a, $b, etc): initially each token represents a a respective top level JSON element of the iterable, beyond that range each next auto-token will represent an atomic value of the JSON tree as if it walked recursively

Improvements, changes, fixes:

  • behavior improvements:

    • redesigned and improved processing of options chain-sets logic: lifted a caveat of using -J/-j/-a in intermediate chain-sets (now it works inline with the expected option behavior in any of the option sets)
    • when unquoting strings with -qq a translation of UTF-8 code points (e.g.: \uD123), as well as correct processing of UTF-8 surrogate pairs added
    • improved label update operations: now also any atomic value (null / boolean / numeric) can update a label (before labels could be updated only with string types)
    • improved namespace behavior for -p/-s operations (now namespaces from the respective walks are not lost in such operations and could be reused later)
  • performance improvements:

    • redesigned and improved namespaces storage policy so that it does not slow down walks (used to be the case, noticeable when storing big JSONs)
    • optimized performance for -e with -i/-u shell executions, where all such walks are attempted to be executed in a single run (popen session), otherwise defaulted to a legacy (slower) way (to enforce the legacy way give -ee)
  • code design improvements:

    • added compile options:
      • -DBG_dTS (effective only in junction with -DBG_mTS or -DBG_uTS) - debug timestamp display delta instead of absolute stamps (handy for cpu profiling)
      • -DNDBG_PARSER: disables parsing debugs - handy when deep debugging huge JSONs (to skip the parsing part)
    • speed up template interpolations (by breaking away from catching JSON parsing exceptions towards processing parsing by return value)
    • improved performance when outputting walked elements (-w)
    • improved debug outputs when displaying JSONs longer than the term width (the same update ensures correct displaying of UTF-8 strings)
  • various fixes:

    • fixed locality of <>q, <>Q searches: it accidentally became global after last redesign of lexeme implementation, now it's local to the search tree (UT'ed)
    • fixed accidentally broken options translation in the built-in mini-guide (-g)
    • fixed a rogue debug level when debugging -e option
    • fixed a very corner crash occurring upon -u/-i based source walks predicated -pp option usage and only when resulted walks gets invalidated by any of the prior walks (UT'ed of course)
    • fixed an issue when last walk control (-x0 or -x/-1) worked in the first JSON but did not work in any subsequent -if there were multiple (UT'ed)

1.75d

4 years ago

Release Notes for jtc v.1.75d

New features:

  • performance improvements and some more fixes:

Improvements, changes, fixes:

  • completely reworked the logic of <>g, <>G, <>q, <>Q lexemes by externalizing their storages into standalone caches, that made them run as fast as a bare metal sort and not slowing down walking
  • removed some superfluous optimization in the interpolation logic (it was limiting some corner use-cases)
  • parsed quoted solidus (\/) now always translated into a unquoted (/), unless -q is given which restricts behavior to quoted-only
  • option -nn does not engulf -n now (i.e. if both behaviors required then both to be spelled: -nnn)
  • added a token $file holding the name of a currently processed input file - so that it could be interpolated if required
  • improved $PATH token interpolation so that the namespace $# also could be utilized with it (upon interpolation into a string template)
  • reinstated -mm behavior (advertised in the last version but missed)
  • fixed engagement of lexeme <..>u in interim options sets
  • fixed quite a rare misbehavior of branching lexemes <>f ... <>F

1.75c

4 years ago

Release Notes for jtc v.1.75c

New features:

  • No new features, some more minor improvements and fixes:

Improvements, changes, fixes:

  • for all iterables undergoing template interpolation generate auto-tokens $a, $b, etc (and $A, $B, etc) for all values (and for objects' respective labels)
  • for lexemes setting JSON in the namespace, e.g.: <ns:..>v if parsing JSON value fails - try promoting it to JSON string first, and only if it fails too then throw an exception
  • made options -z, -zz non-transient (i.e. to be used only in the final options set)
  • some code fixes for MacPorts compatibility
  • fixed issue: interpolation of $? token should work even w/o -x0 (-x/-1) option

1.75b

4 years ago

Release Notes for jtc v.1.75b

New features:

  • Quick fixes for overlooks in a design of the new features, which sneaked past UT:

Improvements, changes, fixes:

  • fixed issue: when shell evaluation fails, it might break options -ei / -eu logic
  • fixed/improved handling of ; char in shell eval operations -ei, -eu: treat only a standalone occurrence of \; as terminating symbol (and not when it's a trailing character - to allow cli chaining in argument)
  • fixed issue: all non-transient output view options -qq, -r, -t and -f, plus a bare qualifier - - should be ignored in all the interim chain sets, but the last one (except the bare qualifier - - it has a global scope, i.e. cited in any of chained option set will force initial reading from stdin )
  • fixed issue: accidentally broken bare qualifier - (input redirect)
  • fixed issue: -f option for chained sets, also extended -f: now it forces any output to file, allowing redirecting even walks
  • option -z now outputs size in a JSON compatible format, e.g.: { "size": 100 }

1.75a

4 years ago

Release Notes for jtc v.1.75

New features:

  • introduced a new semi-compact printing view. The view is engaged when the suffix c is appended to -t option, e.g.:-t2c, -tc. The semi-compact view is a middle ground between compact (-r) and pretty-printed (-t, default) views: when a JSON iterable is made of atomic values only (and/or empty iterables {}, []), it will be printed in a compact (one-line) format, the rest is pretty-printed

  • introduced operations chaining via delimiter /:

    • chaining delimiter(s) pretty much replaces jtc ... | jtc ... | jtc ... notation with jtc ... / ... / ... - the advantage is huge: jtc now is capable of processing multiple chained operations w/o printing-parsing interim JSONs (which is quite expensive operation) - that speeds up operations and simplifies notation Another benefit is that it becomes possible to pass namespace(s) from one chain set into another (which is impossible with piping notation)
    • chain-delimiter / only splits options notations, not working when cited among file arguments
  • introduced an optional step notation in range subscripts and search lexemes qualifiers: [N:M:S], <..>N:M:S: S must be strictly positive value. In search quantifiers <..>::{S} if after interpolation the value happens to be negative (or zero) then the default step 1 is applied

  • new search lexemes <..>g and <..>G allow going over JSON elements in a sorted order (ascending and descending respectively). When applied w/o quantifiers allow finding min and max values respectively

  • a new directive <..>Z - preserves into a namespace a selected (walked) JSON entry size (a recursive and non-recursive behaviors applied respectively). <..>Z1 lexeme (i.e., with quantifier 1) - saves into a namespace a currently walked JSON string size (if the walked JSON is not a string, the value -1 is saved)

  • a new lexeme <..>W - preserves a current walk-path (as a JSON array) into a namespace variable

  • introduced a new parsing behavior (-mm) allowing accepting ill-formed JSONs with clashing labels by collecting them into arrays (e.g.: { "a": 1, "a": 2 } will be parsed into { "a": [ 1, 2] }

  • rebranded jtc into JSON transformational chains to reflect better tool's purpose and capability

Improvements, changes, fixes:

  • enhanced template interpolation (-T...):

    • removed prior limitations: now, application of templates is universal to all operations - executed as a last step for the respective walk(s)
    • extended template-interpolations of JSON iterables into strings: the former could be interpolated into the string values as enumerations: the enumeration separator value (default ", ") will be taken from newly introduced namespace $#
  • new namespaces added:

    • $#: holds the separator used when a JSON iterable is interpolated into a JSON string (default value ", ")
    • $_: holds the separator used when $path is interpolated to join path tokens (default value "_")
    • $$?: holds the separator used upon template expansion when interpolation token {$?} is used (default value ",")
  • introduced quantifiers for F directive (both recursive and non-recursive):

    • a new semantic for <>Fn quantifier: if n > 0 (i.e. non-default), it will let continue walking past <>Fn directive skipping to nth lexeme from F: e.g.: <>F1 - will continue walking right from the immediately following lexeme, <>F2 will continue walking from 2nd lexeme past <>Fn (i.e., skipping the first one), etc.
    • a new semantic for ><Fn quantifier: if n > 0 (i.e. non-default), it allows additional replications of the entire walk (before the lexeme ><F) n times
  • enhanced <..>I directive behavior:

    • initialization of the namespace value could be done now within the lexeme itself, e.g.: <c:100>I1 - will initialize counter c with the value 100 before the directive executes (unlike typical behavior where namespace initialization/preservation is applied as the last step end of lexeme walking)
    • a new additional semantic for <..>In:m quantifier, where n is an increment step (as before, no changes here), m - is a new multiplier (integer only), e.g.: <a:10>I5:2, after the first walking the namespace a will hold (10 + 5) * 2 = 30 - in such notation, first the increment is applied and then the multiplier
    • the directive also understands now an empty token {} for the increment and/or the multiplier : <..>I{}:{} - the empty token will will refer to the currently walked (numeric) value - this is the only lexeme where such empty token notion makes sense and supported
  • improved -jj option behavior: now the clashing labels will override each other (thus, only the last value will be retained), to collect even clashing labels (into an array) use -m modifier

  • improved behavior of -ll toggle - now it gleans all the labels, not just the first one (as before) - typically used together with -j

  • performance improvements:

    • in the JSON library, for ARY/OBJ declarations stepped away from std::initializer_list to variadic templated arguments (that permits use of move semantic now in the initialization notations, which simplified the usage and improved performance
    • improved performance of buffered read from <stdin> (now, it's almost as fast as the read from files)
    • same way improved performance of file read in options (-i, -u, -c)
    • drastically improved performance of <>q, <>Q searches by making them cacheable: they are still quite memory hungry, still are the slowest among all searches, but now they are not prone to exponential decay and can be used on big JSONs with a predictable processing time
  • added a few compilation options:

    • -DBG_FLOW: a new debug of the execution flows (tracing an entry and exit point of every DEBUGGABLE function/method). Add -DBG_FLOW flag when compiling to effectuate such debugging - complements nicely -DBG_CC flag when debugging copy-constructors for optimization
    • -DBG_mTS: lets debugging output to have time-stamp with milliseconds accuracy
    • -DBG_uTS: lets debugging output to have time-stamp with microseconds accuracy
  • debugability improvements:

    • added printing backtrace in the unlikely event of a crash (only when debug is enabled). On MacOs/BSD it will print demangled back-tracing
    • improved parsing output when debugged - now it'll be auto-adjusted to terminal's width
  • program design improvements:

    • simplified program design for all cases of source/destination walks - that also fixed the prior caveat with labels updates through the shell evaluation (now even nested labels could be updated, the caveat is removed)
    • enhanced a logical way of handlings for all directives where applicable - now, the directive is activated only once per a walk pass (applied to directives z, Z, W, v, k, I)
    • improved/fixed behavior for shell evaluation (-e with -i/-u) argument parsing behavior for Linux/GNU only (Macos/BSD were fine - getopt() GNU implementation works differently than MacOS/BSD's)
  • more fixes and enhancements:

    • fixed issue: directives <..>I and <..>u also must support interpolated name-spaced quantifiers: <..>I{ns}, <...>u{ns}
    • fixed issue: fail-safe <>f directive should not fire after there have been successful matches in iterables
    • fixed/improved parsing of <..>j search lexeme when the content is a template
    • fixed Linux options parsing (to behave the same way like on Macos/BSD)
    • fixed a corner crash when move semantic applied on multiple walks and the prior walk deletes the object pointed by the subsequent, interleaved walk
    • fixed a corner crash when a search lexeme (i,o,c, etc) was matching a root iterable (array/object) and at the same time attempted saving it into a namespace
    • fixed a crash when blank (or white space only) input was combined with the streamed read (-a)

standard.json

4 years ago

This is a sample JSON used in performance testings (the JSON was generated from the XML file)

1.74

4 years ago

Release Notes for jtc v.1.74

New features:

  • No new features, some enhancements and stability improvements

Improvements, changes, fixes:

  • improved handling of <>q and <>Q lexemes drastically (performance and memory utilization-wise), also now those lexemes may be empty (before it was mandatory to give a namespace in the lexemes)
  • option -t now can be used to control spacing for the compact (one-row) view, e.g.: -r -t0 will print a very compact one-liner JSON, w/o spaces; when used together (-r and -t), it will also control spacing in stringification of JSON in template operations
  • introduced a support for flags in Regular Expressions (namely: INOCESXAGP); flags can be given only as trailing part of the RE (they will be removed from the RE itself after parsing), e.g.: <...\I\O>R:; also, flags ESXAGP facilitate various REGEX grammars, those flags will be processed only once (i.e., only the first setup grammar flag will have an effect, all subsequent will be ignored)
  • enhanced behavior of empty <>k lexeme - now it also has an effect when placed in front ><F lexeme (i.e. logical end of walking), not only at the syntactical end of the walk-path
  • enhanced interpolation behavior of {} token: when interpolation of a JSON object fails, it will be re-attempted to strip the JSON object as an array - effectively allowing conversion of JSON objects into JSON arrays in templates.
  • fixed an issue when a "move" - semantic (-p) applied to update (-u)/insert (-i) operations: if the walks of the latter fails entirely then a purge should not be applied on destination walks (UT'ed)

1.73

4 years ago

Release Notes for jtc v.1.73

New features:

  • No new features, some enhancements and stability improvements

Improvements, changes, fixes:

  • lifted label update operation when -u is used to update a label (when a walk-path is ending with an empty ...<>k lexeme): now it's possible to update/rewrite recursively even nested labels w/o failures
  • converted walking (walk iteration) to a non-recursive loop, now walks are virtually endless (i.e. able to walk JSONs of virtually ANY size and depth) and not restricted by a depth of a stack
  • -T processing for -i<walk> and -u<walk> operations is enhanced to match the same behavior as for -w<walk>: templates are interpolated per walks now (if a count of templates and walks matches), or round-robin fashion otherwise (before, for some weird reasons all templates were applied for each such walk)
  • fixed insertion (-i) when the last lexeme of a walk is non-empty <..>k then no label reinterpretation occurs (so it's consistent now with the same behavior of -u)
  • removed support for the empty <>z notation form of the lexeme: erasing entire namespace is idiomatically inconsistent with the walk design (and might lead to confusion or misunderstanding of the expected behavior), so only non-empty lexemes <..>z are supported now (and restricted to)
  • fixed a crash when debugging is on (quite a corner case though)
  • fixed a programmatic error (rarely occurs only in API calls) where Json class would falsely expect <stdin> in the event when parsing constructor throws

1.72a

4 years ago

Release Notes for jtc v.1.72a (NOTE: The Release is republished, as prior binaries were incorrect ones)

New features:

  • introduced a new directive I which let incrementing/decrementing numerical JSONs preserved in the namespace (and ignore other JSON types), e.g.: <var>I3, <var>I-1. If var wasn't defined before, the iteration begins with 0; however, it's possible to initialize it with other than 0 values - see User Guide
  • introduce an auto-namespace variable $? to reference the last processed walk, this facilitates use-cases when converting input JSON to .csv format; see User Guide for more
  • introduced new lexemes <..>P, <..>N to match any JSON strings and JSON numerical types respectively. Before, to facilitate the same, REGEX lexemes were used: <.*>R and <.*>D respectively, but new lexemes work faster and allow storing matched values in the namespace)
  • Template-interpolation was enhanced with new capability to jsonize JSON strings (containing embedded JSONs) and stringify JSONs - similar to respective options -qq and -rr but now programmatically. See User Guide for the syntax and examples
  • added a new semantic to -x option: -xN[/M] notation lets specifying a frequency of walks to be displayed - (every Nth walk) staring from the optional offset M (zero based); e.g.: -x4 - display every 4th walk, while -x4/1 will do the same starting from the 2nd (index is zero based) walk. Also, note a special notation case: -x0/N - will display Nth (zero based) walk only once, this could be abbreviated to -x/N; N is positive, but also supported -1 value - to display the last walk

Improvements, changes, fixes:

  • improved -jl options combination behavior: in some cases it wasn't robust and failed providing the expected result. Plus, introduced a new merge format: -jlnn - all clashing values will be aggregated (disrespecting JSON structured grouping vs. as in the case of -jl)
  • lifted handling of atomic JSONs - simplified the code allow applying walk-paths now even onto the atomic JSON values
  • extended null-interpolation for JSON strings: before it was applied for JSON arrays and JSON objects only). Now, the empty variable interpolation in the string, following either of ,, ; will be taken into account, e.g.: -T'"{}, "' - if {} is empty, then result of interpolation will be empty too: ""
  • improved buffered file read speed (3 times faster) and stdin buffered speed (1.5-2 times faster), improved handling of non-existent/bad file-arguments (when multiple given)
  • enhanced move semantic of -u, -i operations, so that when used together with -pp it also works as expected with those options (before it was only working for -p and -pp notation was ignored)
  • a last in the walk k lexeme (e.g.: -w'... <>k') is only subjected for re-interpretation of the label if it's empty (<>k) now; the non-empty <..>k lexeme then doesn't (the value is preserved in the namespace, so no need to re-interpret it then)
  • template pertain per walk feature used to work only for interleaved walks and -n was cancelling it (template then were applied round-robin). Now, even with -n (i.e., for sequenced walk processing) templates are also pertained per walks, in the unlikely event when round robin behavior is required -nn notation will support it. Also, template pertain per walk feature is enhanced to be engaged only when number of templates (-T) matches the number of provided walks (-w), otherwise round-robin template application behavior is engaged.
  • put a hard cutoff on a too deep recursion shall any unforeseen case (while walking) occurs in the future; the same enhancement has fixed a case of too deep recursion (with subsequent stack overflow) for a corner case of lexeme <>F usage occurring in processing really huge JSONs only
  • the message "notice: option -J cancels streaming input" is printed now only when -a + <stdin> were selected explicitly together with -J (and not when -a is implicitly imposed upon -J)
  • fixed parsing debug (offset for a streamed read now shows a correct value - from the beginning of a stream, instead of the beginning of an internal circular buffer, other read debugs (buffered, cin) are unaffected)
  • fixed empty <>b lexeme - it was not working (as documented), now it matches any boolean JSON value (UT'ed)