17

Bringing the Unix Philosophy to the 21st Century | Brazil's Blog

 4 years ago
source link: https://blog.kellybrazil.com/2019/11/26/bringing-the-unix-philosophy-to-the-21st-century/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Try the jc web demo!

Do One Thing Well

The Unix philosophy of using compact expert tools that do one thing well and pipelining them together to manipulate data is a great idea and has worked well for the past few decades. This philosophy was outlined in the 1978 Foreward to the Bell System Technical Journal describing the UNIX Time-Sharing System:

screen-shot-2019-11-25-at-4.57.42-pm.png?resize=495%2C346&ssl=1
Foreward to the Bell System Technical Journal

Items i and ii are oft repeated, and for good reason. But it is time to take this philosophy to the 21st century by further defining a standard output format for non-interactive use.

Unfortunately, this is the state of things today if you want to grab the IP address of one of the ethernet interfaces on your linux system:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1

This is not beautiful.

Up until about 2013 it made just as much sense as anything to assume unstructured text was a good way to output data at the command line. Unix/linux has many text parsing tools like sed, awk, grep, tr, cut, rev, etc. that can be pipelined together to reformat the desired data before sending it to the next program. Of course, this has always been a pain and is the source of many questions all over the web about how to parse the output of so-and-so program. The requirement to parse unstructured (in some cases only human readable) data manually has made life much more difficult than it needs to be for the average linux administrator.

But in 2013 a certain data format called JSON was standardized as ECMA-404 and later in 2017 as RFC 8259 and ISO/IEC 21778:2017. JSON is ubiquitous these days in REST APIs and is used to serialize everything from data between web applications, to Indicators of Compromise in the STIX2 specification, to configuration files. There are JSON parsing libraries in all modern programming languages and even JSON parsing tools for the command line, like jq. JSON is everywhere, it’s easy to use, and it’s a standard.

Had JSON been around when I was born in the 1970’s Ken Thompson and Dennis Ritchie may very well have embraced it as a recommended output format to help programs “do one thing well” in a pipeline.

To that end, I argue that linux and all of its supporting GNU and non-GNU utilities should offer JSON output options. We already see some limited support of this in systemctl and the iproute2 utilities like ip where you can output in JSON format with the -j option. The problem is that many linux distros do not include a version that offers JSON output (e.g. centos, currently). And even then, not all functions support JSON output as shown below:

Here is ip addr with JSON output:

$ ip -j addr show dev ens33
"addr_info": [{},{}]
"ifindex": 2,
"ifname": "ens33",
"flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
"mtu": 1500,
"qdisc": "fq_codel",
"operstate": "UP",
"group": "default",
"txqlen": 1000,
"link_type": "ether",
"address": "00:0c:29:99:45:17",
"broadcast": "ff:ff:ff:ff:ff:ff",
"addr_info": [{
"family": "inet",
"local": "192.168.71.131",
"prefixlen": 24,
"broadcast": "192.168.71.255",
"scope": "global",
"dynamic": true,
"label": "ens33",
"valid_life_time": 1732,
"preferred_life_time": 1732
"family": "inet6",
"local": "fe80::20c:29ff:fe99:4517",
"prefixlen": 64,
"scope": "link",
"valid_life_time": 4294967295,
"preferred_life_time": 4294967295
$ ip -j addr show dev ens33
 [{
         "addr_info": [{},{}]
     },{
         "ifindex": 2,
         "ifname": "ens33",
         "flags": ["BROADCAST","MULTICAST","UP","LOWER_UP"],
         "mtu": 1500,
         "qdisc": "fq_codel",
         "operstate": "UP",
         "group": "default",
         "txqlen": 1000,
         "link_type": "ether",
         "address": "00:0c:29:99:45:17",
         "broadcast": "ff:ff:ff:ff:ff:ff",
         "addr_info": [{
                 "family": "inet",
                 "local": "192.168.71.131",
                 "prefixlen": 24,
                 "broadcast": "192.168.71.255",
                 "scope": "global",
                 "dynamic": true,
                 "label": "ens33",
                 "valid_life_time": 1732,
                 "preferred_life_time": 1732
             },{
                 "family": "inet6",
                 "local": "fe80::20c:29ff:fe99:4517",
                 "prefixlen": 64,
                 "scope": "link",
                 "valid_life_time": 4294967295,
                 "preferred_life_time": 4294967295
             }]
     }
 ]

And here is ip route not outputting JSON, even with the -j flag:

$ ip -j route
default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100
192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131
192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100
$ ip -j route
 default via 192.168.71.2 dev ens33 proto dhcp src 192.168.71.131 metric 100 
 192.168.71.0/24 dev ens33 proto kernel scope link src 192.168.71.131 
 192.168.71.2 dev ens33 proto dhcp scope link src 192.168.71.131 metric 100

Some other more modern tools like, kubectl and the aws-cli tool offer more consistent JSON output options which allow much easier parsing and pipelining of the output. But there are many older tools that still output nearly unparsable text. (e.g. netstat, lsblk, ifconfig, iptables, etc.) Interestingly Windows PowerShell has embraced using structured data, and that’s a good thing that the linux community can learn from.

How do we move forward?

The solution is to start an effort to go back to all of these legacy GNU and non-GNU command line utilities that output text data and add a JSON output option to them. All operating system APIs, like the /proc and /sys filesystems should serialize their files in JSON or provide the data in an alternative API that outputs JSON.

jc-logo-e1574817007702.jpg?resize=750%2C195&ssl=1

https://github.com/kellyjonbrazil/jc

In the meantime, I have created a tool called jc (https://github.com/kellyjonbrazil/jc) that converts the output of dozens of GNU and non-GNU commands and configuration files to JSON. Instead of everyone needing to create their own custom parsers for these common utilities and files, jc acts as a central clearinghouse of parsing libraries that just need to be written once and can be used by everyone.

Try the jc web demo!

jc is now available as an Ansible filter plugin!

JC In Action

Here’s how jc can be used to make your life easier today and until GNU/linux brings the Unix philosophy into the 21st century. Let’s take that same example of grabbing an ethernet IP address from above:

$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138
$ ifconfig ens33 | grep inet | awk '{print $2}' | cut -d/ -f1 | head -n 1
192.168.71.138

And here’s how you do the same thing with jc and a CLI JSON parsing tool like jq:

$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138
$ ifconfig ens33 | jc --ifconfig | jq -r '.[].ipv4_addr'
192.168.71.138
$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138
$ jc ifconfig ens33 | jq -r '.[].ipv4_addr'
192.168.71.138

Here’s another example of listing the listening TCP ports on the system:

$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
$ netstat -tln | tr -s ' ' | cut -d ' ' -f 4 | rev | cut -d : -f 1 | rev | tail -n +3
25
22

That’s a lot of text manipulation just to get a simple list of port numbers! Here’s the same thing using jc and jq:

$ netstat -tln | jc --netstat | jq '.[].local_port_num'
$ netstat -tln | jc --netstat | jq '.[].local_port_num'
25
22
$ jc netstat -tln | jq '.[].local_port_num'
$ jc netstat -tln | jq '.[].local_port_num'
25
22

Notice how much more intuitive it is to search and compare semantically enhanced structured data vs. awkwardly parsing low-level text? Also, the JSON output can be preserved to be used by any higher-level programming language like Python or JavaScript without line parsing. This is the future, my friends!

jc currently supports the following parsers: arp, df, dig, env, free, /etc/fstab, history, /etc/hosts, ifconfig, iptables, jobs, ls, lsblk, lsmod, lsof, mount, netstat, ps, route, ss, stat, systemctl, systemctl list-jobs, systemctl list-sockets, systemctl list-unit-files, uname -a, uptime, and w.

If you have a recommendation for a command or file type that is not currently supported by jc, add it to the comments and I’ll see if I can figure out how to parse and serialize it. If you would like to contribute a parser, please feel free!

With jc, we can make the linux world a better place until the OS and GNU tools join us in the 21’st century!

Like this:

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK