JSON processing utility
JSON
s for lexemes supporting interpolation: <..>R
, <..>L
, <..>D
, <..>j
(UT'd)-u
/-i
/-c
: the behavior should match the behavior of -T
option (string-interpolation of iterables might have produced different results)$wuid
which refers to deterministic walk's unique id for each walk given by user (handy for making JSON elements collections per each walk)<>f .. ><f
pairs: this is a use-case to resolve recursive lookup chains-p
/ -s
options-i
/ -u
/ -c
options (as well enabled namespaces passing to the template)v1.76
)<..>I
when initializing namespace{{}}
, {{..}}
(when tokens used as standalone then no interpolation is needed and JSON
can be retrieved directly from walks)<<<'{"tbl":["a","b","c"]}' jtc -w'[tbl]<:>k' -qqT'"{}"'
will generate 0, 1, 2
output (instead of a b c
) - that is predicated by last walked <:>k
directive (lexeme spelling in this case is limited to :
).$@
auto-REGEX namespace - it holds all the RE matches (entire matches, or group matches) in a JSON array. That way it's easy to split strings, e.g.: <<<'"abc, def, ghi"' jtc -w'<[^, ]+>R' -rT'{{$@}}'
produces output: [ "abc", "def", "ghi" ]
Json
class to allow parsing templates and rewrite interpolation so that it's done via binary construction (serdes way will remain only for string interpolations).<..>v
directive:
<var:<JSN/TMP>>v1
allow saving a JSON spelled literally, or out of a template right into a namespace (currently any lexeme value in <..>v
directive is either a JSON or promoted to a JSON string)<var:[{{$PATH}}, <JSON/TMP>]>v2
- the JSON in this form allows reconstructing a JSON in a namespace (i.e., incrementally build up a JSON in the namespace)jtc -rw'<>e:' -T'[{{$PATH}}, {{}}]'
- this would allow processing a virtually endless JSONs w/o any memory pressure. (parsing of such streamed JSON will be done in a concurrent thread)Release Notes for jtc
v.1.76
jtc
now will read/parse all files concurrently (on multi-core cpu); to disable multithreading (and process files sequentially) give option -a
(normally, the option is implied and redundant when multiple files given)<..>S
- complements directive <..>W
: walks JSON tree as per the preserved path-i
/-u
/-c
contains a stream of JSONs, it's automatically converted into an array of JSONs-i
/-u
/-c
now additionally can hold a template (e.g.: -u0 -T<template>
now could be collapsed into -u<template>
)<..>R
, <..>L
, <..>D
) now are subjected to template interpolation as well, though namespace usage in such lexemes is limited to alphabetical names only ('cause numeric names would clash with regex quantifiers) - template interpolation obviously occurs before regex applied$A
, $B
, etc) now also hold indices if the respected values are in array (it used to work only for objects)$A
and $a
for a label/index and a value respectively) for easier template-interpolation operations$?
to any value (even empty one) in a walk triggers resetting of the respective auto-token $?
(which holds historical values) to the default value ""
(it's a user-controlled way to reset the token, in addition to the existing trigger - template interpolation failure)"{}"
token, all atomic values within the iterable get interpolated into the string recursively
>{{}}<
) - operation now is consistent across all JSON types (null / bool / numeric used to behave differently)$abc
) to 3 letters only (to avoid clashing with tokens like $file
and all future tokens) - the use case for auto-generated token is template-interpolation for relatively short arrays / objects, thus 3 letters is sufficient to address iterables up to 18278 values in size)$a
, $b
, etc): initially each token represents a a respective top level JSON element of the iterable, beyond that range each next auto-token will represent an atomic value of the JSON tree as if it walked recursivelybehavior improvements:
J
/-j
/-a
in intermediate chain-sets (now it works inline with the expected option behavior in any of the option sets)-qq
a translation of UTF-8 code points (e.g.: \uD123
), as well as correct processing of UTF-8 surrogate pairs added-p
/-s
operations (now namespaces from the respective walks are not lost in such operations and could be reused later)performance improvements:
-e
with -i
/-u
shell executions, where all such walks are attempted to be executed in a single run (popen
session), otherwise defaulted to a legacy (slower) way (to enforce the legacy way give -ee
)code design improvements:
-DBG_dTS
(effective only in junction with -DBG_mTS
or -DBG_uTS
) - debug timestamp display delta instead of absolute stamps (handy for cpu profiling)-DNDBG_PARSER
: disables parsing debugs - handy when deep debugging huge JSONs (to skip the parsing part)-w
)various fixes:
<>q
, <>Q
searches: it accidentally became global after last redesign of lexeme implementation, now it's local to the search tree (UT'ed)-g
)-e
option-u
/-i
based source walks predicated -pp
option usage and only when resulted walks gets invalidated by any of the prior walks (UT'ed of course)-x0
or -x/-1
) worked in the first JSON but did not work in any subsequent -if there were multiple (UT'ed)Release Notes for jtc
v.1.75d
<>g
, <>G
, <>q
, <>Q
lexemes by externalizing their storages into standalone caches, that made them run as fast as a bare metal sort and not slowing down walking\/
) now always translated into a unquoted (/
), unless -q
is given which restricts behavior to quoted-only
-nn
does not engulf -n
now (i.e. if both behaviors required then both to be spelled: -nnn
)$file
holding the name of a currently processed input file - so that it could be interpolated if required$PATH
token interpolation so that the namespace $#
also could be utilized with it (upon interpolation into a string template)-mm
behavior (advertised in the last version but missed)<..>u
in interim options sets<>f ... <>F
Release Notes for jtc
v.1.75c
$a
, $b
, etc (and $A
, $B
, etc) for all values (and for objects' respective labels)<ns:..>v
if parsing JSON value fails - try promoting it to JSON string first, and only if it fails too then throw an exception-z
, -zz
non-transient (i.e. to be used only in the final options set)$?
token should work even w/o -x0
(-x/-1
) optionRelease Notes for jtc
v.1.75b
-ei
/ -eu
logic;
char in shell eval operations -ei
, -eu
: treat only a standalone occurrence of \;
as terminating symbol (and not when it's a trailing character - to allow cli chaining in argument)-qq
, -r
, -t
and -f
, plus a bare qualifier -
- should be ignored in all the interim chain sets, but the last one (except the bare qualifier -
- it has a global scope, i.e. cited in any of chained option set will force initial reading from stdin
)-
(input redirect)-f
option for chained sets, also extended -f
: now it forces any output to file, allowing redirecting even walks-z
now outputs size in a JSON compatible format, e.g.: { "size": 100 }
Release Notes for jtc
v.1.75
introduced a new semi-compact printing view. The view is engaged when the suffix c
is appended to -t
option, e.g.:-t2c
, -tc
. The semi-compact view is a middle ground between compact (-r
) and pretty-printed (-t
, default) views: when a JSON iterable is made of atomic values only (and/or empty iterables {}
, []
), it will be printed in a compact (one-line) format, the rest is pretty-printed
introduced operations chaining via delimiter /
:
jtc ... | jtc ... | jtc ...
notation with jtc ... / ... / ...
- the advantage is huge: jtc
now is capable of processing multiple chained operations w/o printing-parsing interim JSONs (which is quite expensive operation) - that speeds up operations and simplifies notation
Another benefit is that it becomes possible to pass namespace(s) from one chain set into another (which is impossible with piping notation)/
only splits options notations, not working when cited among file argumentsintroduced an optional step notation in range subscripts and search lexemes qualifiers: [N:M:S]
, <..>N:M:S
: S
must be strictly positive value. In search quantifiers <..>::{S}
if after interpolation the value happens to be negative (or zero) then the default step 1
is applied
new search lexemes <..>g
and <..>G
allow going over JSON elements in a sorted order (ascending and descending respectively). When applied w/o quantifiers allow finding min and max values respectively
a new directive <..>Z
- preserves into a namespace a selected (walked) JSON entry size (a recursive and non-recursive behaviors applied respectively). <..>Z1
lexeme (i.e., with quantifier 1
) - saves into a namespace a currently walked JSON string size (if the walked JSON is not a string, the value -1
is saved)
a new lexeme <..>W
- preserves a current walk-path (as a JSON array) into a namespace variable
introduced a new parsing behavior (-mm
) allowing accepting ill-formed JSONs with clashing labels by collecting them into arrays (e.g.: { "a": 1, "a": 2 }
will be parsed into { "a": [ 1, 2] }
rebranded jtc
into JSON transformational chains to reflect better tool's purpose and capability
enhanced template interpolation (-T...
):
", "
) will be taken from newly introduced namespace $#
new namespaces added:
$#
: holds the separator used when a JSON iterable is interpolated into a JSON string (default value ", "
)$_
: holds the separator used when $path
is interpolated to join path tokens (default value "_"
)$$?
: holds the separator used upon template expansion when interpolation token {$?}
is used (default value ","
)introduced quantifiers for F
directive (both recursive and non-recursive):
<>Fn
quantifier: if n
> 0
(i.e. non-default), it will let continue walking past <>Fn
directive skipping to n
th lexeme from F
: e.g.: <>F1
- will continue walking right from the immediately following lexeme, <>F2
will continue walking from 2nd lexeme past <>Fn
(i.e., skipping the first one), etc.><Fn
quantifier: if n
> 0
(i.e. non-default), it allows additional replications of the entire walk (before the lexeme ><F
) n
timesenhanced <..>I
directive behavior:
<c:100>I1
- will initialize counter c
with the value 100
before the directive executes (unlike typical behavior where namespace initialization/preservation is applied as the last step end of lexeme walking)<..>In:m
quantifier, where n
is an increment step (as before, no changes here), m
- is a new multiplier (integer only), e.g.: <a:10>I5:2
, after the first walking the namespace a
will hold (10 + 5) * 2 = 30
- in such notation, first the increment is applied and then the multiplier{}
for the increment and/or the multiplier : <..>I{}:{}
- the empty token will will refer to the currently walked (numeric) value - this is the only lexeme where such empty token notion makes sense and supportedimproved -jj
option behavior: now the clashing labels will override each other (thus, only the last value will be retained), to collect even clashing labels (into an array) use -m
modifier
improved behavior of -ll
toggle - now it gleans all the labels, not just the first one (as before) - typically used together with -j
performance improvements:
ARY
/OBJ
declarations stepped away from std::initializer_list
to variadic templated arguments (that permits use of move semantic now in the initialization notations, which simplified the usage and improved performance<stdin>
(now, it's almost as fast as the read from files)-i
, -u
, -c
)<>q
, <>Q
searches by making them cacheable: they are still quite memory hungry, still are the slowest among all searches, but now they are not prone to exponential decay and can be used on big JSONs with a predictable processing timeadded a few compilation options:
-DBG_FLOW
: a new debug of the execution flows (tracing an entry and exit point of every DEBUGGABLE
function/method). Add -DBG_FLOW
flag when compiling to effectuate such debugging - complements nicely -DBG_CC
flag when debugging copy-constructors for optimization-DBG_mTS
: lets debugging output to have time-stamp with milliseconds accuracy-DBG_uTS
: lets debugging output to have time-stamp with microseconds accuracydebugability improvements:
program design improvements:
z
, Z
, W
, v
, k
, I
)-e
with -i
/-u
) argument parsing behavior for Linux/GNU only (Macos/BSD were fine - getopt()
GNU implementation works differently than MacOS/BSD's)more fixes and enhancements:
<..>I
and <..>u
also must support interpolated name-spaced quantifiers: <..>I{ns}
, <...>u{ns}
<>f
directive should not fire after there have been successful matches in iterables<..>j
search lexeme when the content is a templatei
,o
,c
, etc) was matching a root iterable (array/object) and at the same time attempted saving it into a namespace-a
)This is a sample JSON used in performance testings (the JSON was generated from the XML file)
Release Notes for jtc
v.1.74
<>q
and <>Q
lexemes drastically (performance and memory utilization-wise), also now those lexemes may be empty (before it was mandatory to give a namespace in the lexemes)-t
now can be used to control spacing for the compact (one-row) view, e.g.: -r -t0
will print a very compact one-liner JSON, w/o spaces; when used together (-r
and -t
), it will also control spacing in stringification of JSON in template operationsINOCESXAGP
); flags can be given only as trailing part of the RE (they will be removed from the RE itself after parsing), e.g.: <...\I\O>R:
; also, flags ESXAGP
facilitate various REGEX grammars, those flags will be processed only once (i.e., only the first setup grammar flag will have an effect, all subsequent will be ignored)<>k
lexeme - now it also has an effect when placed in front ><F
lexeme (i.e. logical end of walking), not only at the syntactical end of the walk-path{}
token: when interpolation of a JSON object fails, it will be re-attempted to strip the JSON object as an array - effectively allowing conversion of JSON objects into JSON arrays in templates.-p
) applied to update (-u
)/insert (-i
) operations: if the walks of the latter fails entirely then a purge should not be applied on destination walks (UT'ed)Release Notes for jtc
v.1.73
-u
is used to update a label (when a walk-path is ending with an empty ...<>k
lexeme):
now it's possible to update/rewrite recursively even nested labels w/o failures-T
processing for -i<walk>
and -u<walk>
operations is enhanced to match the same behavior as for -w<walk>
:
templates are interpolated per walks now (if a count of templates and walks matches), or round-robin fashion otherwise
(before, for some weird reasons all templates were applied for each such walk)-i
) when the last lexeme of a walk is non-empty <..>k
then no label reinterpretation occurs
(so it's consistent now with the same behavior of -u
)<>z
notation form of the lexeme: erasing entire namespace is idiomatically inconsistent with the
walk design (and might lead to confusion or misunderstanding of the expected behavior), so only non-empty lexemes <..>z
are
supported now (and restricted to)<stdin>
in the event when parsing constructor throwsRelease Notes for jtc
v.1.72a (NOTE: The Release is republished, as prior binaries were incorrect ones)
I
which let incrementing/decrementing numerical JSONs preserved in the namespace (and ignore other
JSON types), e.g.: <var>I3
, <var>I-1
. If var
wasn't defined before, the iteration begins with 0
;
however, it's possible to initialize it with other than 0
values - see User Guide
$?
to reference the last processed walk, this facilitates use-cases when converting
input JSON to .csv
format; see User Guide for more<..>P
, <..>N
to match any JSON strings and JSON numerical types respectively. Before, to facilitate the
same, REGEX lexemes were used: <.*>R
and <.*>D
respectively, but new lexemes work faster and allow storing matched values in
the namespace)-qq
and -rr
but now programmatically. See User Guide for the syntax and examples-x
option: -xN[/M]
notation lets specifying a frequency of walks to be displayed - (every Nth walk) staring
from the optional offset M
(zero based); e.g.: -x4
- display every 4th walk, while -x4/1
will do the same starting from the
2nd (index is zero based) walk.
Also, note a special notation case: -x0/N
- will display Nth
(zero based) walk only once, this could be abbreviated to -x/N
;
N
is positive, but also supported -1
value - to display the last walk-jl
options combination behavior: in some cases it wasn't robust and failed providing the expected result.
Plus, introduced a new merge format: -jlnn
- all clashing values will be aggregated (disrespecting JSON structured grouping vs. as in
the case of -jl
),
, ;
will be taken into account,
e.g.: -T'"{}, "'
- if {}
is empty, then result of interpolation will be empty too: ""
-u
, -i
operations, so that when used together with -pp
it also works as expected with those options
(before it was only working for -p
and -pp
notation was ignored)k
lexeme (e.g.: -w'... <>k'
) is only subjected for re-interpretation of the label if it's empty (<>k
) now;
the non-empty <..>k
lexeme then doesn't (the value is preserved in the namespace, so no need to re-interpret it then)-n
was cancelling it (template then were applied
round-robin). Now, even with -n
(i.e., for sequenced walk processing) templates are also pertained per walks,
in the unlikely event when round robin behavior is required -nn
notation will support it.
Also, template pertain per walk feature is enhanced to be engaged only when number of templates (-T
)
matches the number of provided walks (-w
), otherwise round-robin template application behavior is engaged.<>F
usage occurring in processing really huge JSONs only-a
+ <stdin>
were selected explicitly together
with -J
(and not when -a
is implicitly imposed upon -J
)<>b
lexeme - it was not working (as documented), now it matches any boolean JSON value (UT'ed)