11

Why broot switched to Hjson for its configuration files

 3 years ago
source link: https://dystroy.org/blog/hjson-in-broot/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Why broot switched to Hjson for its configuration files

14 minute read

Published: 2020-12-22

TOML was originally the only format for configuring broot.

Here I explain why Hjson is now the preferred one.

Why I started with TOML

The reasons I chose TOML when I started working on broot may be familiar to many devs:

  • all ancient configuration formats were already known to be flawed
  • it had "obvious" in the name so it was simple, right ?
  • it did look quite simple even if the [[item]] thing was a little puzzling
  • many of the usual problems of other formats weren't present (more on this later)
  • it was quite popular in the Rust world, since it had been chosen for cargo configuration
  • the deserialization tooling in Rust was very good

The result in broot is it was quite OK and could accommodate the rich configuration of Broot.

It's still usable and I expect to keep it as an alternate format.

The problems with TOML

Most users still consider the language alien and confusing, the structure of a big and structured configuration doesn't clearly appear, and I've had too many reports of users not understanding why the line they added was ignored.

Here's an exemple of a case which puzzles users. chars is an array of items:

[[chars]]
name = "A"
value = 10

[[chars]]
name = "B"
value = 11

prop = 3

What the user wanted here, when adding the last prop, was to set a top level property.

But what really happens is that the property is added to the second item of the "chars" array. It's kind of easy to spot in this short example, especially when you're used to TOML, but it's not at all in a big file.

The workaround here would be to put simple properties before the array but that's not how humans work.

Adding a line to a configuration file is something that linux users do all the time.

People who’ve been told that to achieve an effect they would have to add this line in their config.toml file won’t read the whole file to scan whether there’s a better location, they’re used to add it at the end as they do for example in their .bashrc file. Of course, coming back later they would probably try to organize everything but they won’t because their first reaction is that there’s a bug in the application since the added line did nothing.

Such traps aren't really OK when you're configuring a program, especially if you just touch the file occasionnally and don't want to be an expert in it.

So it seemed reasonable to switch to another format but, just like there's no obvious format, there's no obvious choice. Let's review them a little.

What's a configuration file ?

The patterns we use have quite settled and the model I describe here is about universal now.

For the program which reads it, a configuration file contains a value.

A value is either

  • a string
  • a number -- a more precise taxonomy here is useful
  • a boolean -- we can deal without it but it's clearer
  • a list of values -- also called sequence, array, etc.
  • a map of keys (strings) to values -- such a map is often used as a "structured object"
  • the absence of a value -- which is especially important to fall back to a default

But before to be read by a program, a configuration file is written by a human, it's read and modified by other humans. So there's much more to it.

First you need comments. There's no way around that. The purpose of a value, its syntax and meaning, must be explained. And comments are useful too to disable a setting, either temporarily or because it's here as an example.

Then you need the structure to be clear, the format not to push you towards errors.

You want users to be able to change values without reading everything.

And, unless your configuration structure is very small and guaranteed frozen, you want your users to be able to add properties.

And you'd rather have not too many meaningless characters to add to explain your meaning to the program (I won't even speak of XML...).

As we're in 2020 I'll consider UTF-8 as a given.

So let's look at the offering.

Usual Configuration Formats

Here's a list of more or less common file formats, with their advantages and problems as configuration formats.

INI files are cool if you want to set a few properties but there's no real solution for anything deeply structured in those files. So you always find yourself forced to migrate for something more serious.

JSON is ubiquitous as a data exchange format, with files that humans can read and even create or modify.

It generates few errors by itself and is quite clear.

It maps very well to program structures, and the tooling is abundant and good.

It has a few problems for our purpose, though:

  • it lacks comments. In my opinion this is enough to make it unsuitable as a configuration format despite the fact it's a frequent choice
  • you can't have trailing commas, which means you often modify other lines when you add or remove one
  • bracing every key with quotes is painful and unnatural

YAML is compact, quite clear and has comments.

But its space based syntax makes it prone to errors, makes big structures hard to decipher, and it's occasionally hard to debug the indentation related problems.

And it has this strange weirdness, which may matter or not : Do you know how to tell if you have a quoteless string or a boolean ? Probably not (Off is a boolean, see the regexp here). This last problem can be lifted with the Strict YAML variant, though. And it doesn't matter when you deserialize into the types required by the program's structure instead of applying the format typing rules.

Coming after INI, JSON and YAML, TOML solves some of their problems.

It has comments, doesn't make you count invisible chars or put quotes everywhere.

And it has quite a good support for structured data...

Except lists of objects.

I already showed an example of problem but any deep configuration is a problem in TOML.

And nobody seems to understand what's really the syntax and how a human should parse it.

In my opinion this language should be reserved to when the structure is small and mostly frozen.

Hjson

Hjson has a very limited purpose compared to other formats. It's not a normal serialization formats. It's not meant to be written by programs: it's meant to be written by humans, read and modified by other humans, then read by programs. It has been from the start designed for this purpose which makes it suit configuration needs.

The tooling today for Hjson is far behind the one of the other formats I mentioned.

Most parsers or deserializers are really just converting with JSON (yes there's a big overhead).

For Rust, as I wanted a Serde derive guided deserialization, I had to write the deserializer myself (it's probably still buggy at this point).

In another case of Hjson being behind in tooling, in order for the blog example below to be clear, I had to select "CSS" as format for the style parser.

But Hjson, while not perfect, is much better than the other configuration formats of today for most applications:

  • It's clear and almost "obvious"
  • It's already familiar, especially when dealing with lists and maps
  • There are less traps
  • You don't usually have to deal with escape sequences thanks to quoteless strings
  • There are facilities for multiline strings
  • You can have trailing commas, or no commas

The whole file will be wrapped betwen braces, this isn't pretty, but that's a minor point. Another (small?) problem is that you may have a little too much indentation when dealing with deep structures.

A trap I found was about quoteless strings: The rule may be not obvious, especially the fact that a quoteless string can't start with a colon. A least the error is usually not silent here but I'd still recommend to avoid quoteless strings when surprising strings are expected (meaning you should then use a classical quoted string).

Note that some other formats went the same direction, but not so far:

  • JSON5 which fixes some shortcomings of JSON while still being more a general purpose or data exchange format than one really focused on human writing and use in configuration
  • Rome Json which does some of the changes Hjson does

All in one I find this format to be less bad than the other ones and I expect to be more convincing with an example.

Example

Here's the current complete default configuration of broot:

###############################################################
# This configuration file lets you
# - define new commands
# - change the shortcut or triggering keys of built-in verbs
# - change the colors
# - set default values for flags
# - set special behaviors on specific paths
# - and more...
#
# Configuration documentation is available at
#     https://dystroy.org/broot
#
# This file's format is Hjson ( https://hjson.github.io/ )
#
###############################################################
{

	###############################################################
	# Default flags
	# You can set up flags you want broot to start with by
	# default, for example `default_flags="ihp"` if you usually want
	# to see hidden and gitignored files and the permissions (then
	# if you don't want the hidden files you can launch `br -H`)
	# A popular flag is the `g` one which displays git related info.
	#
	# default_flags:

	###############################################################
	# Date/Time format
	# If you want to change the format for date/time, uncomment the
	# following line and change it according to
	# https://docs.rs/chrono/0.4.11/chrono/format/strftime/index.html
	#
	# date_time_format: %Y/%m/%d %R

	###############################################################
	# Whether to mark the selected line with a triangle
	#
	# show_selection_mark: true

	###############################################################
	# Column order
	# cols_order, if specified, must be a permutation of the following
	# array. You should keep the name at the end as it has a variable
	# length.
	#
	# cols_order: [
	# 	mark
	# 	git
	# 	size
	# 	permission
	# 	date
	# 	count
	# 	branch
	# 	name
	# ]

	###############################################################
	# True Colors
	# If this parameter isn't set, broot tries to automatically
	# determine whether true colors (24 bits) are available.
	# As this process is unreliable, you may uncomment this setting
	# and set it to false or true if you notice the colors in
	# previewed images are too off.
	#
	# true_colors: false

	###############################################################
	# Icons
	# If you want to display icons in broot, uncomment this line
	# (see https://dystroy.org/broot/icons for installation and
	# troubleshooting)
	#
	# icon_theme: vscode

	###############################################################
	# Special paths
	# If some paths must be handled specially, uncomment (and change
	# this section as per the examples)
	#
	# special_paths: {
	# 	"/media/slow-backup-disk"		: no-enter
	# 	"/home/dys/useless"			: hide
	# 	"/home/dys/my-link-I-want-to-explore"	: enter
	# }

	###############################################################
	# Verbs and shortcuts
	# You can define your own commands which would be applied to
	# the selection.
	# You'll also find below verbs that you can customize or enable.
	###############################################################
	verbs: [

		# Exemple 1: launching `tail -n` on the selected file (leaving broot)
		# {
		# 	name: tail_lines
		# 	invocation: tl {lines_count}
		# 	execution: tail -f -n {lines_count} {file}
		# }

		# Exemple 2: creating a new file without leaving broot
		# {
		# 	name: touch
		# 	invocation: touch {new_file}
		# 	execution: touch {directory}/{new_file}
		# 	leave_broot: false
		# }

		# A standard recommended command for editing files, that you
		# can customize.
		# If $EDITOR isn't set on your computer, you should either set it using
		#  something similar to
		#   export EDITOR=nvim
		#  or just replace it with your editor of choice in the 'execution'
		#  pattern.
		#  If your editor is able to open a file on a specific line, use {line}
		#   so that you may jump directly at the right line from a preview.
		# Example:
		#  execution: nvim +{line} {file}
		{
			invocation: edit
			key: F2
			shortcut: e
			execution: "$EDITOR +{line} {file}"
			leave_broot: false
		}

		# A convenient shortcut to create new text files in
		# the current directory or below
		{
			invocation: create {subpath}
			execution: $EDITOR {directory}/{subpath}
			leave_broot: false
		}

		{
			invocation: git_diff
			shortcut: gd
			leave_broot: false
			execution: git difftool -y {file}
		}

		# This verb lets you launch a terminal on ctrl-T
		# (on exit you'll be back in broot)
		{
			invocation: terminal
			key: ctrl-t
			execution: $SHELL
			set_working_dir: true
			leave_broot: false
		}

		# Here's an example of a shorctut bringing you to your home directory
		# {
		# 	invocation: home
		# 	key: ctrl-home
		# 	execution: ":focus ~"
		# }

		# A popular set of shorctuts for going up and down:
		#
		# {
		# 	key: ctrl-k
		# 	execution: :line_up
		# }
		# {
		# 	key: ctrl-j
		# 	execution: :line_down
		# }
		# {
		# 	key: ctrl-u
		# 	execution: :page_up
		# }
		# {
		# 	key: ctrl-d
		# 	execution: :page_down
		# }

		# If you develop using git, you might like to often switch
		# to the git status filter:
		# {
		# 	key: ctrl-g
		# 	execution: :toggle_git_status
		# }

		# You can reproduce the bindings of Norton Commander
		# on copying or moving to the other panel:
		#
		# {
		# 	key: F5
		# 	execution: :copy_to_panel
		# }
		# {
		# 	key: F6
		# 	execution: :move_to_panel
		# }
	]

	###############################################################
	# Skin
	# If you want to change the colors of broot,
	# uncomment the following bloc and start messing
	# with the various values.
        # A skin entry value is made of two parts separated with a '/':
        # The first one is the skin for the active panel.
        # The second one, optional, is the skin for non active panels.
	###############################################################
	#
	# skin: {
	# 	default: gray(23) none / gray(20) none
	# 	tree: ansi(94) None / gray(3) None
	# 	file: gray(20) None / gray(15) None
	# 	directory: ansi(208) None Bold / ansi(172) None bold
	# 	exe: Cyan None
	# 	link: Magenta None
	# 	pruning: gray(12) None Italic
	# 	perm__: gray(5) None
	# 	perm_r: ansi(94) None
	# 	perm_w: ansi(132) None
	# 	perm_x: ansi(65) None
	# 	owner: ansi(138) None
	# 	group: ansi(131) None
	# 	count: ansi(136) gray(3)
	# 	dates: ansi(66) None
	# 	sparse: ansi(214) None
	# 	content_extract: ansi(29) None
	# 	content_match: ansi(34) None
	# 	git_branch: ansi(229) None
	# 	git_insertions: ansi(28) None
	# 	git_deletions: ansi(160) None
	# 	git_status_current: gray(5) None
	# 	git_status_modified: ansi(28) None
	# 	git_status_new: ansi(94) None Bold
	# 	git_status_ignored: gray(17) None
	# 	git_status_conflicted: ansi(88) None
	# 	git_status_other: ansi(88) None
	# 	selected_line: None gray(5) / None gray(4)
	# 	char_match: Yellow None
	# 	file_error: Red None
	# 	flag_label: gray(15) None
	# 	flag_value: ansi(208) None Bold
	# 	input: White None / gray(15) gray(2)
	# 	status_error: gray(22) ansi(124)
	# 	status_job: ansi(220) gray(5)
	# 	status_normal: gray(20) gray(3) / gray(2) gray(2)
	# 	status_italic: ansi(208) gray(3) / gray(2) gray(2)
	# 	status_bold: ansi(208) gray(3) Bold / gray(2) gray(2)
	# 	status_code: ansi(229) gray(3) / gray(2) gray(2)
	# 	status_ellipsis: gray(19) gray(1) / gray(2) gray(2)
	# 	purpose_normal: gray(20) gray(2)
	# 	purpose_italic: ansi(178) gray(2)
	# 	purpose_bold: ansi(178) gray(2) Bold
	# 	purpose_ellipsis: gray(20) gray(2)
	# 	scrollbar_track: gray(7) None / gray(4) None
	# 	scrollbar_thumb: gray(22) None / gray(14) None
	# 	help_paragraph: gray(20) None
	# 	help_bold: ansi(208) None Bold
	# 	help_italic: ansi(166) None
	# 	help_code: gray(21) gray(3)
	# 	help_headers: ansi(208) None
	# 	help_table_border: ansi(239) None
	# 	preview: gray(20) gray(1) / gray(18) gray(2)
	# 	preview_line_number: gray(12) gray(3)
	# 	preview_match: None ansi(29)
	# 	hex_null: gray(11) None
	# 	hex_ascii_graphic: gray(18) None
	# 	hex_ascii_whitespace: ansi(143) None
	# 	hex_ascii_other: ansi(215) None
	# 	hex_non_ascii: ansi(167) None
	# }

	# You may find explanations and other skins on
	#  https://dystroy.org/broot/skins
	# for example a skin suitable for white backgrounds


	###############################################################
	# File Extension Colors
	#
	# uncomment and modify the next section if you want to color
	# file name depending on their extension
	#
	# ext_colors: {
	# 	png: rgb(255, 128, 75)
	# 	rs: yellow
	# }

}

In my opinion it's easy to read.

It's too soon to say whether there will be traps in practice, or if users will find it difficult to modify it, but I have reasonable hopes it will be OK.

Conclusion

I'm not promoting Hjson as the right configuration format for every program.

There are for example cases when don't even need a real configuration format (your language is scriptable, like JavaScript ? Have you considered JavaScript files for configuration ?).

And when you deal with just a few properties, you should probably prefer a simpler format.

But I think Hjson is a solution which should be considered and picked more often.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK