Jtc – a powerful CLI tool to manipulate JSON
source link: https://www.tuicool.com/articles/a2quIbI
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
jtc
- cli tool to extract, manipulate and transform source JSON
jtc
stand for: JSON test console
, but it's a legacy name, don't get misled.
jtc
offers a powerful way to select one or multiple elements from a source JSON and apply various actions on the selected elements
at once (wrap selected elements into a new JSON, filter in/out, update elements, insert new elements, remove, copy, move, compare,
transform and swap around).
Content:
Short description
- jtc
is simple but efficient cli utility tool to manipulate JSON data
jtc
offers following features (a short list of main features):
- simple user interface allowing applying a bulk of changes in one command
- featured walk-path interface lets extracting any combination of data from source JSON
- extracted data is representable either as it found, or encapsulated in JSON array/object
- support Regular Expressions when searching source JSON
- fast and efficient processing very large JSON files (built-in search cache)
- insert/updates operations optionally may undergo shell cli evaluation
- features namespaces, interpolation from namespaces and templates
- supports buffered and streamed modes of input reads
- written entirely in C++14, no dependencies (STL only, idiomatic C++, no memory leaks)
- extensively debuggable
- conforms JSON specification ( json.org )
Walk path feature is easy to understand - it's only made of 2 types of lexemes:
-
subscripts - enclosed into
[
,]
: subscripts let traversing JSON tree downwards and upwards -
search lexemes - encased into
<
,>
: search lexemes facilitate either full match or Regex search.
Both types of lexemes are iterable - subscript let iterating over children of currently addressed iterables node (array/object), while iterable search lexemes let iterating over all matches for a given search criteria. A walk path is made of an arbitrary number of lexemes, while the tool accepts a virtually unlimited number of walk paths. See below more detailed explanation with examples
Linux and MacOS precompiled binaries are available for download
For compiling, c++14
(or later) is required:
-
to compile under MacOS, use cli:
c++ -o jtc -Wall -std=c++14 -Ofast jtc.cpp
-
To compile under Linux, use cli:
c++ -o jtc -Wall -std=gnu++14 -static -Ofast jtc.cpp
pass -DNDEBUG
flag if you like to compile w/o debugs, however it's unadvisable -
there's no performance gain from doing so
or download the latest precompiled binary:
- latest macOS
- latest linux 64 bit
- latest linux 32 bit
Compile and install instructions:
download jtc-master.zip
, unzip it, descend into unzipped folder, compile using
an appropriate command, move compiled file into an install location.
here're the example steps for MacOS :
jtc-master.zip unzip jtc-master.zip cd jtc-master c++ -o jtc -Wall -std=c++14 -Ofast jtc.cpp sudo mv ./jtc /usr/local/bin/
For Linux you'd have to compile using this line:
-
c++ -o jtc -Wall -std=gnu++14 -static -Ofast jtc.cpp
Release Notes
See the latest Release Notes
Quick-start guide:
run jtc -g
for walk path explanations, usage notes and additional usage examples
Consider a following JSON (a mockup of a bookmark container), stored in a file Bookmarks
:
{ "Bookmarks": [ { "children": [ { "children": [ { "name": "The New York Times", "stamp": "2017-10-03, 12:05:19", "url": "https://www.nytimes.com/" }, { "name": "HuffPost UK", "stamp": "2017-11-23, 12:05:19", "url": "https://www.huffingtonpost.co.uk/" } ], "name": "News", "stamp": "2017-10-02, 12:05:19" }, { "children": [ { "name": "Digital Photography Review", "stamp": "2017-02-27, 12:05:19", "url": "https://www.dpreview.com/" } ], "name": "Photography", "stamp": "2017-02-27, 12:05:19" } ], "name": "Personal", "stamp": "2017-01-22, 12:05:19" }, { "children": [ { "name": "Stack Overflow", "stamp": "2018-05-01, 12:05:19", "url": "https://stackoverflow.com/" }, { "name": "C++ reference", "stamp": "2018-06-21, 12:05:19", "url": "https://en.cppreference.com/" } ], "name": "Work", "stamp": "2018-03-06, 12:07:29" } ] }
1. let's start with a simple thing - list all URLs:
bash $ jtc -w'<url>l:' Bookmarks "https://www.nytimes.com/" "https://www.huffingtonpost.co.uk/" "https://www.dpreview.com/" "https://stackoverflow.com/" "https://en.cppreference.com/"
The walk-path (an argument of -w
) is a combination of lexemes. There are only 2 types of lexemes:
-
subscript lexemes - enclosed in
[
,]
-
search lexemes - enclosed in
<
,>
- the walk-paths may contain any number of lexemes
let's take a look at the walk-path <url>l:
:
-
search lexemes are enclosed in angular brackets
<
,>
- that style provides a recursive search -
suffix
l
instructs to search among labels only -
quantifier
:
instructs to find all occurrences , such quantifiers makes a path iterable
2. dump all bookmark names from the Work
folder:
bash $ jtc -w'<Work>[-1][children][:][name]' Bookmarks "Stack Overflow" "C++ reference"
here the walk path <Work>[-1][children][:][name]
is made of following lexemes (spaces separating lexemes are optional):
a. <Work>
: find within a JSON tree the first
occurrence where the JSON string
value is matching "Work"
exactly
b. [-1]
: step up
one tier in the JSON tree structure (i.e. address an immediate parent of the found JSON element)
c. [children]
: select/address
a node whose label is "children"
(it'll be a JSON array, at the same tier with Work
)
d. [:]
: select an each node
in the array
e. [name]
: select/address a node whose label is "name"
-
subscript offsets are enclosed into square brackets
[
,]
and may have different meaning:-
simple numerical offsets (e.g.:
[0]
,[5]
, etc) select/address a respective JSON immediate child in the addressed node - a.k.a. numerical subscripts -
slice/range offsets, expressed as
[N:N]
let selecting any slice/range of element in the array/object (any ofN
could be omitted in that notation) -
numerical offsets proceeded with
+
make a path iterable - all children starting with the given index will be selected (e.g.: [+2] will select/address all immediate children starting from 3rd one) - such notation is equivalent of[N:]
-
numerical negative offsets (e.g.
[-1]
,[-2]
, etc ) will select/address a parent of currently selected/found node, a parent of a parent, etc -
textual offsets (e.g.
[name]
,[children]
, etc) select/address nodes with corresponding labels among immediate children (i.e. textual subscripts)
-
simple numerical offsets (e.g.:
*** there's more on offsets and search quantifiers below
in order to understand better how a walk path works, let's run a series of cli in a slow-motion, gradually adding lexemes
to the path, perhaps with the option -l
to see also the labels (if any) of the selected elements:
bash $ jtc -w'<Work>' -l Bookmarks "name": "Work"
bash $ jtc -w'<Work>[-1]' -l Bookmarks { "children": [ { "name": "Stack Overflow", "stamp": "2018-05-01, 12:05:19", "url": "https://stackoverflow.com/" }, { "name": "C++ reference", "stamp": "2018-06-21, 12:05:19", "url": "https://en.cppreference.com/" } ], "name": "Work", "stamp": "2018-03-06, 12:07:29" }
bash $ jtc -w'<Work>[-1][children]' -l Bookmarks "children": [ { "name": "Stack Overflow", "stamp": "2018-05-01, 12:05:19", "url": "https://stackoverflow.com/" }, { "name": "C++ reference", "stamp": "2018-06-21, 12:05:19", "url": "https://en.cppreference.com/" } ]
bash $ jtc -w'<Work>[-1][children][:]' -l Bookmarks { "name": "Stack Overflow", "stamp": "2018-05-01, 12:05:19", "url": "https://stackoverflow.com/" } { "name": "C++ reference", "stamp": "2018-06-21, 12:05:19", "url": "https://en.cppreference.com/" }
bash $ jtc -w'<Work>[-1][children][:][name]' -l Bookmarks "name": "Stack Overflow" "name": "C++ reference"
3. dump all URL's names:
bash $ jtc -w'<url>l:[-1][name]' Bookmarks "The New York Times" "HuffPost UK" "Digital Photography Review" "Stack Overflow" "C++ reference"
this walk path <url>l:[-1][name]
:
-
finds recursively (encasement
<
,>
) all (:
) JSON elements with a label (l
) matchingurl
-
then for an each found JSON element, select its parent (
[-1]
) -
then, select a JSON element with the label
"name"
(encasement[
,]
)
4. dump all the URLs and their corresponding names, preferably wrap found pairs in JSON:
bash $ jtc -w'<url>l:' -w'<url>l:[-1][name]' -jl Bookmarks [ { "name": "The New York Times", "url": "https://www.nytimes.com/" }, { "name": "HuffPost UK", "url": "https://www.huffingtonpost.co.uk/" }, { "name": "Digital Photography Review", "url": "https://www.dpreview.com/" }, { "name": "Stack Overflow", "url": "https://stackoverflow.com/" }, { "name": "C++ reference", "url": "https://en.cppreference.com/" } ]
-
yes, multiple walks (
-w
) are allowed -
option
-j
will wrap the walked outputs into a JSON array, but not just, -
option
-l
used together with-j
will ensure relevant walks are grouped together (try without-l
) -
if multiple walks (
-w
) are present, by default, walked results will be printed interleaved
5. Subscripts (offsets) and Searches explained
In short:
-
Subscript lexemes (
[..]
) facilitate:-
addressing children (by index/label) in JSON iterables
( arrays
and objects
) - i.e. traverse JSON structure downward
from the root (toward leaves), e.g.:
[2]
,[id]
-
addressing parents (immediate and distant) - i.e. traverse JSON structure upwards, toward the the root (from leaves),
e.g.:
[-1]
(tier offset from the walked element),[^2]
(tier offset from the root) -
select ranges and slices of JSON elements in JSON iterables
, e.g.:
[+2]
,[:]
,[:3]
,[-2:]
,[1:-1]
-
addressing children (by index/label) in JSON iterables
( arrays
and objects
) - i.e. traverse JSON structure downward
from the root (toward leaves), e.g.:
-
Search lexemes (
<..>
,>..<
) facilitate:-
recursive (
<..>
) and non-recursive (>..<
) matches -
there're optional one-letter suffixes that may follow the lexemes (e.g.:
<..>Q
) which define type of search: (REGEX) string search, (REGEX) label search, (REGEX) numerical, boolean, null, atomic, objects, arrays (or either), arbitrary JSONs, unique, duplicates, etc. -
there're also optional quantifiers to lexemes (must take the last position, after the suffix if one present) - let selecting
match instance, or range of matches (e.g.:
<id>l3
- will match 4th (zero based) label"id"
; if no quantifier present0
is assumed - first match)
-
recursive (
-
subscript lexemes could be joined with search lexemes over ':' to facilitate scoped search
, e.g.:
[id]:<value>
is a single lexeme which will match recursively the first occurrence of the string"value"
with the label"id"
- i.e."id": "value"
-
Directives: there are a few suffixes which turn a search lexeme into a directive:
- directives do not do any matching, instead they facilitate a certain action/operation with the currently walked JSON element, like: memorize it in the namespace , or erase from it, or memorize its label, or perform a shell cli evaluation
-
couple directives (
<>f
and<>F
) facilitate also walk branching
Refer to
jtc
User Guide
for the detailed explanation of the subscripts, search lexemes and directives.
6. Debugability / JSON validation
jtc
is extensively debuggable: the more times option -d
is given the more debugs will be produced (currently debug depth may go
as deep as 7: -ddddddd
).
Enabling too many debugs might be overwhelming, though one specific case many would find extremely useful - when validating a failing JSON:
bash $ <addressbook-sampe.json jtc jtc json exception: expected_json_value
If JSON is big, it's desirable to locate the parsing failure point. Specifying just one -d
let easily spotting the
parsing failure point and its locus:
bash $ <addressbook-sampe.json jtc -d .read_inputs(), reading json from <stdin> .parsejson(), exception locus: ... ],| "children": [,],| "spouse": null| }... .location_(), exception spot: --------------------------------->| (offset: 967) jtc json exception: expected_json_value bash $
Complete User Guide
Refer to a complete User Guide for further examples and guidelines.
A tiny example of class usage and its interface (c++14):
Say, we want to accomplish a following task:
<stdin> Name
Below is the code sample how that could be achieved using Json.hpp
class and the source JSON - Address Book:
#include <iostream> #include <fstream> #include "lib/Json.hpp" // compile with: c++ -o sort_ab -Wall -std=c++14 sorting_ab.cpp using namespace std; int main(int argc, char *argv[]) { Json jin( {istream_iterator<char>(cin>>noskipws), istream_iterator<char>{}} ); // read and parse json from cin vector<string> names(jin.walk("[AddressBook][+0][Name]"), jin.walk().end()); // get all the names sort(names.begin(), names.end()); // sort the names Json srt = ARY{}; // rebuild AB with sorted records for(const auto &name: names) srt.push_back( move( *jin.walk("[AddressBook][Name]:<" + name + ">[-1]") ) ); cout << jin["AddressBook"].clear().push_back( move(srt) ) << endl; // put back into the original container and print }
Address Book JSON:
bash $ cat addressbook-sample.json { "AddressBook": [ { "Name": "John", "age": 25, "address": { "city": "New York", "street address": "599 Lafayette St", "state": "NY", "postal code": "10012" }, "phoneNumbers": [ { "type": "mobile", "number": "212 555-1234" } ], "children": [], "spouse": null }, { "Name": "Ivan", "age": 31, "address": { "city": "Seattle", "street address": "5423 Madison St", "state": "WA", "postal code": "98104" }, "phoneNumbers": [ { "type": "home", "number": "3 23 12334" }, { "type": "mobile", "number": "6 54 12345" } ], "children": [], "spouse": null }, { "Name": "Jane", "age": 25, "address": { "city": "Denver", "street address": "6213 E Colfax Ave", "state": "CO", "postal code": "80206" }, "phoneNumbers": [ { "type": "office", "number": "+1 543 422-1231" } ], "children": [], "spouse": null } ] } bash $
Output result:
bash$ cat addressbook-sample.json | sort_ab [ [ { "Name": "Ivan", "address": { "city": "Seattle", "postal code": "98104", "state": "WA", "street address": "5423 Madison St" }, "age": 31, "children": [], "phoneNumbers": [ { "number": "3 23 12334", "type": "home" }, { "number": "6 54 12345", "type": "mobile" } ], "spouse": null }, { "Name": "Jane", "address": { "city": "Denver", "postal code": "80206", "state": "CO", "street address": "6213 E Colfax Ave" }, "age": 25, "children": [], "phoneNumbers": [ { "number": "+1 543 422-1231", "type": "office" } ], "spouse": null }, { "Name": "John", "address": { "city": "New York", "postal code": "10012", "state": "NY", "street address": "599 Lafayette St" }, "age": 25, "children": [], "phoneNumbers": [ { "number": "212 555-1234", "type": "mobile" } ], "spouse": null } ] ] bash $
for the complete description of Json class interface, refer to Json.hpp
jtc
vsjq:
jtc
was inspired
by the complexity of jq
interface (and its DSL
),
aiming to provide a user tool which would let attaining the desired result in a more feasible way
utility ideology:
- jq is a stateful processor with own DSL, variables, operations, control flow logic, IO system, etc, etc
-
jtc
is a unix utility confining its functionality to operation types with its data model only (as per unix ideology).jtc
performs one operation at a time and if successive operations required, then cli to be daisy-chained over the pipe symbol|
jqis non-idiomatic in a unix way , e.g., one can write a program in jq language that even has nothing to do with JSON. Most of the requests (if not all) to manipulate JSONs are ad hoc type of tasks, and learning jq 's DSL for ad hoc type of tasks is an overkill (that purpose is best facilitated with GPL ).
The number of asks on the stackoverflow to facilitate even simple queries for jq is huge - that's the proof in itself that for many people feasibility of attaining their asks with jq is a way too low, hence they default to posting their questions on the forum.
jtc
on the other hand is a utility (not a language), which employs a novel but powerful concept, which "embeds" the ask right into the walk-path
. That facilitates a much higher feasibility of attaining a desired result: building the walk-path a lexeme by a lexeme,
one at a time, provides an immediate visual feedback and let coming up with the desired result quite quickly.
learning curve:
- jq : before you could come up with a query to handle even a relatively simple ask, you need to become an expert in jq 's language, which will take some time. Coming up with the complex queries requires it seems having a "PhD" in jq , or spending lots of time on stackoverflow and similar forums
-
jtc
employs only a single (but powerful) concept of the walk-path (which is made only of 2 types of lexemes, each type though has several variants) which is easy to grasp.
handling irregular JSONs:
- jq : handling irregular JSONs for jq is not a challenge, building a query is! The more irregularities you need to handle the more challenging the query ( jq program) becomes
-
jtc
was conceived with the idea of being capable of handling complex irregular JSONs with a simplified interface - that all is fitted into the concept of the walk-path , while daisy-chaining multiplejtc
operations it's possible to satisfy almost every query.
programming model
- jq is written in C , which drags all intrinsic problems the language has dated its creation
-
jtc
is written in idiomatic C++ (the most powerful programming language to date) using STL only. Main JSON engine/library does not have a singlenew
operator, nor it has a single naked pointer acting as a resource holder/owner, thusjtc
is guaranteed to be free of memory leaks (at least one class of the problems is off the table) - STL guaranty .
Also,jtc
is written in a very portable way, it should not cause any problems compiling it under any unix like system.
JSON numerical fidelity:
-
jq
is not compliant with JSON numerical definition. What jq does, it simply converts a symbolic numerical representation to an
internal binary and keeps it that way. That approach:
- is not compliant with JSON definition of the numerical values
- it has problems retaining required precision
- might change original representation of numericals
-
jtc
validates all JSON numericals per JSON standard and keep numbers internally in their original symbolical format, so it's free of all the above caveats:
jtc
jq
Invalid Json: [ 00 ]
<<<'[00]' jtc
<<<'[00]' jq -c .
Parsing result
jtc json exception: missed_prior_enumeration
[0]
Precision test:
<<<'[0.99999999999999999]' jtc -r
<<<'[0.99999999999999999]' jq -c .
Parsing result
[ 0.99999999999999999 ]
[1]
Retaining original format:
<<<'[0.00001]' jtc -r
<<<'[0.00001]' jq -c .
Parsing result
[ 0.00001 ]
[1e-05]
performance:
here's a 4+ million node JSON test file :
bash $ jtc -zz standard.json 4329975
The table below compares jtc
and jq performance for similar operations (using TIMEFORMAT="user %U sec"
,
a median value is selected from 5 attempts):
jtc
jq
parsing JSON:
parsing JSON:
bash $ time jtc standard.json | wc -l
bash $ time jq . standard.json | wc -l
7091578
7091578
user 8.686 sec
user 18.848 sec
removing by key from JSON:
removing by key from JSON:
bash $ time jtc -pw'<attributes>l:' standard.json | wc -l
bash $ time jq 'del(..|.attributes?)' standard.json | wc -l
5573690
5573690
user 9.765 sec
user 27.399 sec
The computer's spec used for tests:
Model Name: MacBook Pro (15-inch, 2019) Model Identifier: MacBookPro15,1 Processor Name: Intel Core i7 Processor Speed: 2,6 GHz Number of Processors: 1 Total Number of Cores: 6 L2 Cache (per Core): 256 KB L3 Cache: 12 MB Hyper-Threading Technology: Enabled Memory: 16 GB 2400 MHz DDR4
Refer to a complete User Guide for further examples and guidelines.
Enhancement requests are more than welcome: [email protected]
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK