

GitHub - dbohdan/structured-text-tools: A list of command line tools for manipul...
source link: https://github.com/dbohdan/structured-text-tools
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

README.md
Structured text tools
The following is a list of text-based file formats and command line tools for manipulating each.
Contents
- DSV
- XML, HTML
- JSON
- YAML, TOML
- INI
- Configuration files
- Bonus round: CLIs for single-file databases
- License
- Disclosure
DSV
Delimiter-separated values, including CSV, TSV, etc.
Awk
Awk is a POSIX-standard command line tool and programming language for processing DSV data. If you use Linux, macOS or a BSD, you almost certainly have it installed. See below for Windows.
- If you already know how to program, the nawk man page is a great way to learn Awk quickly. What you learn from it will apply to other implementations on different platforms. Read it first if you feel overwhelmed by the sheer size of the GNU Awk manual.
- Awk.info archive — an extensive resource on Awk.
- AWK Vs NAWK Vs GAWK — a comparison of implementations' features.
- busybox-w32 includes a full implementation of POSIX Awk and other tools like
sed
in a single Windows executable.
POSIX commands
Name Descriptioncut
Select portions of each line in one or several files. Can work with delimiter-separated fields. (Manual: man 1 cut
on your system, GNU, FreeBSD.)
join
Join the lines from two files on a common field. (Manual: man 1 join
, GNU, FreeBSD.)
paste
Combine several consecutive lines in a text file into one. (Manual: man 1 paste
, GNU, FreeBSD.)
sort
Sort lines by key fields. (Manual: man 1 sort
, GNU, FreeBSD.)
uniq
Find or remove repeated lines. (Manual: man 1 uniq
, GNU, FreeBSD.)
Other tools
Name and link Description GNU datamash Perform statistical operations on text input. Millersed
, awk
, cut
, join
and sort
for name-indexed data such as CSV and tabular JSON.
rows
A Python library with a CLI. Convert between a number of file formats for tabular data: CSV, XLS, XLSX, ODS, and others. Query the data (via SQLite). Combine tables. Generate schemas.
tab
A non-Turing-complete statically typed programming language for data processing. An alternative to Awk.
eBay's TSV utilities
Filter, summarize, join, and perform other operations on TSV files. Written in D.
xsv
Index, slice, analyze, split, and join CSV files.
SQL-based utilities
Name and link Programming language and database engine Features Usage link License csvkit Python, SQLite 3 Use header row for column names. Custom input and output encoding. Custom input field separator. Custom output field separator. Custom output formatting. CSV JOINs. Python module. Excel and JSON to CSV. CSV to JSON. SQL queries for CSV. Usage MIT q Python, SQLite 3 Use header row for column names. Custom input and output encoding. Gzipped input. Custom input field separator (string literal). Custom output field separator. Custom output formatting. Table JOINs. Python module. Usage GNU GPLv3 rows Python, SQLite 3 See the Other tools section. Usage GNU LGPLv3 Sqawk Tcl, SQLite 3 Use header row for column names. Custom input field separator (regexp, per-file). Custom input record delimiter (regexp, per-file). Custom output field separator. Custom output record separator. Custom table names. Merge selected columns into one. Skip columns. ASCII/Unicode table output, CSV input and output. JSON output. Keep SQLite file. Tcl input and output. Table JOINs. Usage MIT sqawk C, SQLite 3 Use header row for column names. Column name aliases. Can skip lines until a regexp matches. Custom input field separator (string literal, per-file). Keep SQLite file. Show generated SQL. Table JOINs. Usage ? Squawk Python, custom SQL interpreter Access log and CSV input. JSON and CSV output. Python code generation. — Three-clause BSD termsql Python, SQLite 3 Use header rows for column names. Custom field separator (regexp). Custom record separator (string literal). Lines as columns. Skip a given number of lines and the beginning and at the end. Merge selected columns into one. HTML, CSV, SQL, and Tcl output. Manual MIT trdsql Go, MySQL/PostgreSQL/SQLite 3 Use header row for column names. Custom field separator (string literal). Table JOINs. CSV, LTSV, and JSON input. CSV, LTSV, JSON, ASCII table, Markdown output. Usage MIT textql Go, SQLite 3 Use header rows for column names. Keep SQLite file. Custom input field separator (string literal). Usage MITXML, HTML
Name and link Description pup Query HTML pages with CSS selectors. Static binaries available for releases. Inspired by jq. Saxon Query XML and HTML data with XPath. Documentation. Temme Query HTML with CSS-like selectors to extract JSON. Temme extends CSS selectors with value capture patterns. tq Query HTML with CSS selectors. Xidel Query or modify XML and HTML pages with XPath, XQuery 3, and CSS selectors. xml2 Convert XML and HTML to and from flat, greppable lists of "path=value" statements. Source code mirror. XMLStarlet Query, modify, and validate XML documents.See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.
JSON
Name and link Description fx Run arbitrary JavaScript on JSON input. Standalone binaries available. gron Convert JSON to and from flat, greppable lists of "path=value" statements. jl Query and manipulate JSON using a tiny functional language. jo Create JSON objects from the shell. jp JMESPath jq Create and manipulate JSON with a functional (as in "functional programming") DSL. Can convert JSON to other formats. jshon Create and manipulate JSON using getopt-style command-line options. json2 Convert JSON to and from flat, greppable lists of "path=value" statements. Modeled after xml2. jsonaxe Create and manipulate JSON with a Python-based DSL. Inspired by jq. json Run arbitrary JavaScript on JSON input. json-table Convert nested JSON into CSV or TSV for processing in the shell. json.tool (Python 3 docs) Validate and pretty-print JSON. This module is part of the standard library of Python 2/3 and is likely to be available wherever Python is installed. jsonwatch Track changes in JSON data from the command line. Works likewatch -d
.
lobar
Explore JSON interactively or process it in batch with a wrapper for lodash.chain()
. An alternative to jq with a JavaScript syntax.
rq
Create and manipulate JSON with a DSL inspired by Rust, C and JavaScript. Similar to jq. Supports JSON, YAML and TOML as well as binary formats like Apache Avro and MessagePack.
validjson
Validate or pretty-print JSON.
YAML, TOML
With a format converter like Remarshal (below) you can use JSON tools to process YAML and TOML, but make sure you do not lose data in the conversion.
Name and link Description Remarshal Convert between YAML, TOML, and JSON. Validate or pretty-print each of the three formats. rq See the JSON section. shyaml Query YAML. Can output null-terminated strings for use in shell scripts. validtoml Validate TOML. validyaml Validate or pretty-print YAML.INI
Name and link Platform License Description crudini Any with Python 2.x GNU GPLv2 Set and remove properties in INI files. Retrieve properties as shell script commands to set the corresponding variables. Outputs updated INI data or changes files in place. IniFile (DOS version) Windows (x86, x86-64), MS-DOS Closed-source freeware Set and remove properties in INI files. Retrieve properties as a list of batch fileset
commands to set the corresponding variables. Changes files in place.
initool
Windows, Linux, FreeBSD
MIT
Set and remove properties in INI files and check for their existence. Retrieve properties' values as plain text. Outputs updated INI data.
Configuration files
Name and link Description Augeas Query and modify a number of file formats. Not all of the formats are equally well supported by Augeas and for some only a limited subset of all valid files can be parsed. Elektra Query and modify configuration files. Shares Augeas' limitations when it comes to application-specific configuration files (it uses the same lenses), but has better support for generic formats such as JSON and INI.Bonus round: CLIs for single-file databases
Name and link Description File format Firebird Firebird is a FOSS database that can be used from a single file, like SQLite. "isql is a program that allows the user to issue arbitrary SQL commands". Binary GNU Recutils "[A] set of tools and libraries to access human-editable, plain text databases called recfiles." Text-based, roughly "key: value" SDB "[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection." Binary sqlite3(1) "[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database." BinaryLicense
The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.
Disclosure
Sqawk, jsonwatch, Remarshal and initool are developed by the curator of this document.
Recommend
-
78
dbohdan/automatic-api master
-
45
README.md Microsoft Azure CLI
-
25
README.md Try the new jc web demo! JC is
-
5
How to Sort Text Files in Linux Using the sort Command By Deepesh Sharma Published 6 hours ago If you want to sort the contents...
-
9
Edge Command Line Arguments Microsoft Edge offers broad variety of configuration options via Group Policy (for Enterprises), the edge://settings page, the edge://flags page (mostly experim...
-
9
Polypane 9 ships with a completely new way of making and editing screenshots, support for different types of structured data in the Meta panel, new debug tools and many improvements to the outline and elements panel as well as performance improvem...
-
13
elixirParser Combinators in Elixir: Taming Semi-Structured TextDavid Sulc on Oct 18, 2022
-
8
...
-
4
Support is great. Feedback is even better."Thanks for taking the time to check out Markup! Any feedback you can share would be much appreciated. I'm particularly interested in your thoughts on the overall usability of the annotation...
-
6
Extracting structured data from unstructured text with PaLM 13 Nov 2023 by dzlab In this article, we’ll go over one of the main use cases that LLMs like PaLM are used...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK