6

Writing a Clojure highlighter from scratch

 2 years ago
source link: https://blog.michielborkent.nl/writing-clojure-highlighter.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Writing a Clojure highlighter from scratch

In the aftermath of my previous blog post about using Nextjournal's clojure-mode for better highlighting, I tried optimizing the JS output and got a look at the internals of CodeMirror 6. I realized that writing a Clojure highlighter from scratch wasn't that hard if you had the right tools at hand:

  • A tool which can break Clojure code into parts, tells you what each part is (symbol, keyword, vector, etc.) and also provides a way to put the parts together again, with preservation of whitespace. This tool is already available in babashka and is a library called rewrite-clj.
  • A tool that can provide additional semantic information: is a symbol a local or a var? The static analysis output of clj-kondo provides that information.

I spent my Sunday afternoon combining these tools which resulted in a 160 line script called highlighter.clj which is now used to do the highlighting of this blog.

This blog post is a high level walkthrough of the code. Let's begin with the first step.

1. Parse blocks of Clojure code from markdown and apply highlighting.

(defn highlight-clojure [markdown]
(str/replace markdown #"(?m)``` clojure\n([\s\S]+?)\n\s*```"
(fn [[_ code]]
(try (-> (str/trim code)
(htmlize)
(str/replace "[" "\\[")
(str/replace "]" "\\]")
(str/replace "*" "\\*"))
(catch Exception e
(log "Could not highlight: " (ex-message e) code)
markdown)))))

Parsing blocks of Clojure code from a markdown post is done using a basic regex. Then we pass the Clojure code to the htmlize function. After that we escape some markdown-specific characters, so they will be preserved after markdown compilation. If the highlighting failed for some reason, we log it and fall back on the unprocessed markdown. During the implementation I found several snippets of Clojure code with unbalanced parens which I had to fix, since rewrite-clj doesn't accept it. So all examples from this blog should be copy-pastable into your Clojure editor without problems from now on.

2. Parse and analyze Clojure using clj-kondo and rewrite-clj:

(defn htmlize [code]
(binding [*analysis*
(let [ana (analysis code)]
{:locals (locals ana)
:var-defs (var-defs ana)})]
(let [html (-> code p/parse-string-all node->html)]
(format "<pre><code class=\"clojure hljs\">%s</code></pre>" html))))

Clj-kondo provides information about vars, keywords and locals. We will apply special styling to var definitions and locals and their usages.

3. Clj-kondo analysis

(pods/load-pod 'clj-kondo/clj-kondo "2021.10.19")

(require '[pod.borkdude.clj-kondo :as clj-kondo])

(defn analysis [code]
(let [tmp (doto (fs/file (fs/create-temp-dir) "code.clj")
fs/delete-on-exit)]
(spit tmp code)
(-> (clj-kondo/run!
{:lint [(str tmp)]
:config {:output {:analysis {:locals true}}}})
:analysis)))

To call clj-kondo from babashka, we use the binary from the pod registry which is automatically downloaded via load-pod if you provide a fully qualified symbol and version. We write the code to a temp file and lint it. We ask for the static analysis data. Locals are not included by default, so we set :locals to true. Later on we want to detect if a symbol is a local or a var. We do this by making a set of locations from the analysis data for each group:

(defn locals [analysis]
(->> analysis
((juxt :locals :local-usages))
(apply concat)
(map (juxt :row :col)) set))

(defn var-defs [analysis]
(->> analysis
:var-definitions
(map (juxt :name-row :name-col)) set))

4. Rewrite-clj nodes

Next, we parse the code to rewrite-clj nodes. Each node has a tag for which we write a multi-method to dispatch on:

(defmulti node->html tag)

For each kind of node we will emit a <span> element with an associated class. For instance, :foo will become <span class="keyword">:foo</span> and so on.

A small helper function:

(defn span [class contents]
(format "<span class=\"%s\">%s</span>"
class contents))

Here is the implementation for a map node:

(defmethod node->html :map [node]
(span "map" (format "{%s}"
(str/join (map node->html (:children node))))))

A map node has :children so we just call node->html for each child and join the strings together.

I wrote a :default implementation that logs a warning for nodes that I hadn't implemented yet:

(defmethod node->html :default [node]
(binding [*out* *err*]
(println "Unhandled tag:" (tag node)))
(span (name (tag node))
(if (:children node)
(str/join "" (map node->html (:children node)))
(str node))))

and added implementations for all of the nodes that occurred in Clojure snippets in all the posts of this blog so far, by working through the list of unhandled tags.

Rewrite-clj doesn't give different tags for symbols, strings, numbers and so on: it groups them under the :token tag. So there is some extra work needed to get different highlighting for different types of tokens. I wrote a function that returns a CSS class by looking at the contents of the node or at the type of value of the node. For a symbol node, I want different highlighting for vars and locals. This is where I check in the clj-kondo analysis if the symbol on that location is a local or var and else fall back on the general symbol CSS class.

(defn token-class [node]
(cond (:k node) "keyword"
(:lines node) "string"
(contains? node :value)
(let [v (:value node)]
(cond (number? v) "number"
(string? v) "string"
(boolean? v) "boolean"
(nil? v) "nil"
(symbol? v)
(cond (contains? (:locals *analysis*)
((juxt :row :col) (meta node)))
"local"
(contains? (:var-defs *analysis*)
((juxt :row :col) (meta node)))
"def"
:else
"symbol")
:else (name (tag node))))
;; fallback, log missing case
:else (log (tag node) (keys node) (sexpr node) (type (sexpr node)))))

(defmethod node->html :token [node]
(span (token-class node)
(escape (str node))))

5. Styling

Finally I wrote some styling:

.def { color: #00f; }
.symbol { color: #708; }
.local { color: cadetblue; }
.string { color: #a11; }
.number { color: blue; }
.keyword { color: #219; }
.uneval { filter: opacity(0.5); }

For :uneval nodes, which is rewrite-clj's name for expressions that are ignored using the reader underscore dispatch macro: #_(+ 1 2 3), I set opacity to 0.5. Can you see the difference?

(+ 1 2 3)
#_(+ 1 2 3)

That's it really. A Sunday afternoon well spent. The code for the highlighter is here.

Discuss this post here.

Published: 2021-11-08


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK