9

Unison: A Content-Addressable Programming Language

 4 years ago
source link: https://www.unisonweb.org/docs/tour/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This document introduces "the big idea" behind Unison and walks through the basics of using the Unison codebase manager to develop and publish your first Unison library. We will introduce bits and pieces of the core Unison language and its syntax as we go. The Unison language reference is a more in-depth resource on this if you have questions or want to learn more.

If you want to follow along with this document (highly recommended), this guide assumes you've already gone through the steps inthe quickstart guide.

The source for this document is on GitHub . Feedback and improvements are most welcome!

The most important goal

If there is one motivating idea behind Unison, it's this: the technology for creating software should be thoughtfully crafted in all aspects. Needless complexity and difficulties should be stripped away, leaving only that exhilarating creative essence of programming that made many of us want to learn this subject in the first place. Or at the very least, if we can't have this, let's have programming be reasonable . The fact that things were done a certain way in the 1970s is not a good reason to keep doing them, especially if they make programming worse.

Sure, it's sensible to make compromises regarding when and where to innovate, rather than trying to revolutionize everything right now. But let's be honest that it's a compromise, and not forget to improve things later.

The big technical idea

Now, if there is one big technical idea behind Unison, explored in pursuit of the overall goal of making programming better, it's this: Unison definitions are identified by content. Each Unison definition is some syntax tree, and by hashing this tree in a way that incorporates the hashes of all that definition's dependencies, we obtain the Unison hash which uniquely identifies that definition. This is the basis for some serious improvements to the programmer experience: it eliminates builds and most dependency conflicts, allows for easy dynamic deployment of code, typed durable storage, and lots more.

When taken to its logical endpoint, this idea of content-addressed code has some striking implications. Consider this: if definitions are identified by their content, there's no such thing as changing a definition , only introducing new definitions. That's interesting. What may change is how definitions are mapped to human-friendly names. For example, x -> x + 1 (a definition) as opposed to Nat.increment (a name we associate with it for the purposes of writing and reading other code that references it). An analogy: Unison definitions are like stars in the sky. We can discover the stars in the sky and pick different names for these stars, but the stars exist independently of what we choose to call them.

But the longer you spend with the odd idea of content-addressed code, the more it starts to take hold of you. It's not arbitrary or strange, but a logical and sensible choice with tremendous practical benefits. You start to appreciate the simplicity of the idea and see the need for it everywhere ("this would be a lot easier if the code were content-addressed..."). Is it really feasible, though, to build a programming language around this idea?

Part of the fun in building Unison was in working through the implications of what seemed like a great core idea. A big question that arose: even if definitions themselves are unchanging, we do sometimes want to change which definitions we are interested in and assign nice names to. So how does that work? How do you refactor or upgrade code? Is the codebase still just a mutable bag of text files, or do we need something else?

We do need something else to make it nice to work with content-addressed code. In Unison we call this something else the Unison Codebase Manager .

:wave: to the Unison codebase manager

When first launching Unison in a new directory, we get a message like:

No codebase exists here so I'm initializing one in: .unison/v1

What's happening here? This is the Unison Codebase Manager starting up and initializing a fresh codebase. We're used to thinking about our codebase as a bag of text files that's mutated as we make changes to our code, but in Unison the codebase is represented as a collection of serialized syntax trees, identified by a hash of their content and stored in a collection of files inside of that .unison/v1 directory.

The Unison codebase format has a few key properties:

  • It is append-only : once a file in the .unison directory is created, it is never modified or deleted, and files are always named uniquely and deterministically based on their content.
  • As a result, a Unison codebase can be versioned and synchronized with Git or any similar tool and will never generate a conflict in those tools.

If you haven't already worked through thequickstart guide, let's download the Unison base library to the .base namespace:

.> pull https://github.com/unisonweb/base .base

This command uses git behind the scenes to sync new definitions from the remote Unison codebase to the local codebase.

Because of the append-only nature of the codebase format, we can cache all sorts of interesting information about definitions in the codebase and never have to worry about cache invalidation . For instance, Unison is a statically-typed language and we know the type of all definitions in the codebase--the codebase is always in a well-typed state. So one thing that's useful and easy to maintain is an index that lets us search for definitions in the codebase by their type. Try out the following commands (new syntax is explained below):

.> find : [a] -> [a]

  1. base.Heap.sort : [a] -> [a]
  2. base.List.distinct : [a] -> [a]
  3. base.List.reverse : [a] -> [a]
  4. base.Heap.sortDescending : [a] -> [a]

.> view 3

  base.List.reverse : [a] -> [a]
  base.List.reverse as =
    use base.List +:
    base.List.foldl (acc a -> a +: acc) [] as

Here, we did a type-based search for functions of type [a] -> [a] , got a list of results, and then used the view command to look at the nicely formatted source code of one of these results. Let's introduce some Unison syntax:

  • base.List.reverse : [a] -> [a] is the syntax for giving atype signature to a definition. We pronounce the : symbol as "has type", as in "reverse has the type [a] -> [a] ".
  • [Nat] is the syntax for the type consisting of lists of natural numbers (terms like [0,1,2] and [3,4,5] , and [] will have this type), and more generally [Foo] is the type of lists whose elements have some type Foo .
  • Any lowercase variable in a type signature is assumed to be universally quantified , so [a] -> [a] really means and could be written forall a . [a] -> [a] , which is the type of functions that take a list whose elements are some (but any) type, and return a list of elements of that same type.
  • base.List.reverse takes one parameter, called as . The stuff after the = is called the body of the function, and here it's ablock, which is demarcated by whitespace.
  • acc a -> .. is the syntax for an anonymous function.
  • Function arguments are separated by spaces and function application binds tighter than any operator, so f x y + g p q parses as (f x y) + (g p q) . You can always use parentheses to control grouping more explicitly.
  • The declaration use base.List +: lets us reference the function base.List.+: using just +: . (This function prepends an element to the front of a list.)Use clauses like this can be placed in any Unison block; they don't need to go at the top of your file.

Try doing view base.List.foldl if you're curious to see how it's defined.

Names are stored separately from definitions so renaming is fast and 100% accurate

The Unison codebase, in its definition for reverse , doesn't store names for the definitions it depends on (like the foldl function); it references these definitions via their hash. As a result, changing the name(s) associated with a definition is easy.

Let's try this out. reverse is defined using List.foldl , where l is a needless abbreviation for left . Let's rename that to List.foldLeft to make things clearer. Try out the following command (you can use tab completion here if you like):

.> move.term base.List.foldl base.List.foldLeft

  Done.

.> view base.List.reverse

  base.List.reverse : [a] -> [a]
  base.List.reverse as =
    use base.List +:
    base.List.foldLeft (acc a -> a +: acc) [] as

Notice that view shows the foldLeft name now, so the rename has taken effect. Nice!

To make this happen, Unison just changed the name associated with the hash of foldl in one place . The view command just looks up the names for the hashes on the fly, right when it's printing out the code.

This is important: Unison isn't doing a bunch of text mutation on your behalf, updating possibly thousands of files, generating a huge textual diff, and also breaking a bunch of downstream library users who are still expecting that definition to be called by the old name. That would be crazy, right?

So rename and move things around as much as you want. Don't worry about picking a perfect name the first time. Give the same definition multiple names if you want. It's all good!

:point_up: Using alias.term instead of move.term introduces a new name for a definition without removing the old name(s).

If you're curious to learn about the guts of the Unison codebase format, you can check out the v1 codebase format specification .

Use undo to back up a step. (We don't have a redo yet, though).

.> undo

  :rewind:

  Here's the changes I undid:

  > Moves:

    Original name   New name
    base.List.foldl base.List.foldLeft

.>

Great! OK, go drink some water, and then let's learn more about Unison's interactive way of writing and editing code.

Unison's interactive scratch files

The codebase manager lets you make changes to your codebase and explore the definitions it contains, but it also listens for changes to any file ending in .u in the current directory (including any subdirectories). When any such file is saved (which we call a "scratch file"), Unison parses and typechecks that file. Let's try this out.

Keep your ucm terminal running and open up a file, scratch.u (or foo.u , or whatever you like) in your preferred text editor (if you want syntax highlighting for Unison files,follow this link for instructions on setting up your editor).

Now put the following in your scratch file:

use .base

square : Nat -> Nat
square x = x * x

This defines a function called square . It takes an argument called x and it returns x multiplied by itself.

The first line, use .base , tells Unison that you want to use short names for the base libraries in this file (which allows you to say Nat instead of having to say base.Nat ).

When you save the file, Unison replies:

:white_check_mark:

I found and typechecked these definitions in ~/unisoncode/scratch.u. If you do an
`add` or `update` , here's how your codebase would change:

  ⍟ These new definitions are ok to `add`:

    square : base.Nat -> base.Nat

Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels.

It typechecked the square function and inferred that it takes a natural number and returns a natural number, so it has the type Nat -> Nat . It also tells us that square is "ok to add ". We'll do that shortly, but first, let's try calling our function right in the scratch.u file, just by starting a line with > :

use .base

square : Nat -> Nat
square x = x * x

> square 4

And Unison replies:

6 | > square 4
      ⧩
      16

That 6 | is the line number from the file. The > square 4 on line 6 of the file, starting with a > is called a "watch expression", and Unison uses these watch expressions instead of having a separate read-eval-print-loop (REPL). The code you are editing can be run interactively, right in the same spot as you are doing the editing, with a full text editor at your disposal, with the same definitions all in scope, without needing to switch to a separate tool.

The use .base is a wildcard use clause . This lets us use anything from the base namespace under the root unqualified. For example we refer to base.Nat as simply Nat .

Question:do we really want to reevaluate all watch expressions on every file save? What if they're expensive? Luckily, Unison keeps a cache of results for expressions it evaluates, keyed by the hash of the expression, and you can clear this cache at any time without ill effects. If a result for a hash is in the cache, Unison returns that instead of evaluating the expression again. So you can think of and use your .u scratch files a bit like spreadsheets, which only recompute the minimal amount when dependencies change.

? There's one more ingredient that makes this work effectively, and that's functional programming. When an expression has no side effects, its result is deterministic and you can cache it as long as you have a good key to use for the cache, like the Unison content-based hash. Unison's type system won't let you do I/O inside one of these watch expressions or anything else that would make the result change from one evaluation to the next.

Let's try out a few more examples:

-- A comment, ignored by Unison

> base.List.reverse [1,2,3,4]
> 4 + 6
> 5.0 / 2.0
> not true
:white_check_mark:

~/unisoncode/scratch.u changed.

Now evaluating any watch expressions (lines starting with
`>`)... Ctrl+C cancels.

  6 | > base.List.reverse [1,2,3,4]
        ⧩
        [4, 3, 2, 1]

  7 | > 4 + 6
        ⧩
        10

  8 | > 5.0 / 2.0
        ⧩
        2.5

  9 | > not true
        ⧩
        false

Testing your code

Let's add a test for our square function:

use .base

square : Nat -> Nat
square x = x * x

use test.v1

test> tests.square.ex1 = run (expect (square 4 == 16))

Save the file, and Unison comes back with:

7 | test> tests.square.ex1 = run (expect (square 4 == 16))

:white_check_mark: Passed : Passed 1 tests.

Some syntax notes:

  • The test> prefix tells Unison that what follows is a test watch expression. Note that we're also giving a name to this expression, tests.square.ex1 .

The expect function has type Boolean -> Test . It takes a Boolean expression and gives back a Test , which can be run to produce a list of test results, of type [base.Test.Result] (try view base.Test.Result ). In this case there was only one result, and it was a passed test.

A property-based test

Let's test this a bit more thoroughly. square should have the property that square a * square b == square (a * b) for all choices of a and b . The testing library supports writing property-based tests like this. There's some new syntax here, explained afterwards:

use .base

square : Nat -> Nat
square x = x * x

use test.v1

test> tests.square.ex1 = run (expect (square 4 == 16))

test> tests.square.prop1 =
  go _ = a = !nat
         b = !nat
         expect (square a * square b == square (a * b))
  runs 100 go
8 |   go _ = a = !nat

:white_check_mark: Passed : Passed 100 tests. (cached)

This will test our function with a bunch of different inputs.

  • The Unison block which begins after an = begins a Unison block, which can have any number of bindings (like a = ... ) all at the same indentation level, terminated by a single expression (here expect (square ..) ), which is the result of the block.
  • You can call a function parameter _ if you just plan to ignore it. Here, go ignores its argument; its purpose is just to make go lazily evaluated so it can be run multiple times by the runs function.
  • !expr means the same thing as expr () , we say that !expr forces thedelayed computation expr .
  • Note: there's nothing special about the names tests.square.ex1 or tests.square.prop1 ; we could call those bindings anything we wanted. Here we just picked some uncreative names based on the function being tested. Use whatever naming convention you prefer.

nat comes from test.v1 - test.v1.nat . It's a generator of natural numbers. !nat generates one of these numbers.

Adding code to the codebase

The square function and the tests we've written for it are not yet part of the codebase. So far they only exists in our scratch file. Let's add it now. Switch to the Unison console and type add . You should get something like:

.> add

  ⍟ I've added these definitions:

    tests.square.ex1    : [base.Test.Result]
    tests.square.prop1  : [base.Test.Result]
    square              : base.Nat -> base.Nat

You've just added a new function and some tests to your Unison codebase. Try typing view square or view tests.square.prop1 . Notice that Unison inserts precise use statements when rendering your code. use statements aren't part of your code once it's in the codebase. When rendering code, a minimal set of use statements is inserted automatically by the code printer, so you don't have to be precise with your use statements.

If you type test at the Unison prompt, it will "run" your test suite:

.> test

  Cached test results (`help testcache` to learn more)

  ◉ tests.square.ex1       : Passed 1 tests.
  ◉ tests.square.prop1     : Passed 100 tests.

  :white_check_mark: 2 test(s) passing

  Tip:  Use view tests.square.ex1 to view the source of a test.

But actually, it didn't need to run anything! All the tests had been run previously and cached according to their Unison hash. In a purely functional language like Unison, tests like these are deterministic and can be cached and never run again. No more running the same tests over and over again!

Unison namespaces and use clauses

Now that we've added our square function to the codebase, how do we reference it elsewhere?

The Unison namespace is the mapping from names to definitions. Names in Unison look like this: math.sqrt , .base.Int , base.Nat , base.Nat.* , ++ , or foo . That is: an optional . , followed by one or more segments separated by a . , with the last segment allowed to be an operator name like * or ++ .

We often think of these names as forming a tree, much like a directory of files, and names are like file paths in this tree.Absolute names (like .base.Int ) start with a . and are paths from the root of this tree and relative names (like math.sqrt ) are paths starting from the current namespace, which you can set using the namespace (or equivalently cd ) command:

.> namespace mylibrary

  :point_up:  The namespace .mylibrary is empty.

.mylibrary>

Notice the prompt changes to .mylibrary> , indicating your current namespace is now .mylibrary . When editing scratch files, any relative names not locally bound in your file will be resolved by prefixing them with the current namespace of .mylibrary . And when you issue an add command, the definitions are put directly into this namespace. For instance, if we added x = 42 to our scratch file and then did .mylibrary> add , that would create the definition .mylibrary.x .

You can use namespace . to move back to the root.

When we added square , we were at the root, so square and its tests are directly under the root. To keep our root namespace a bit tidier, let's go ahead and move our definitions into the mylibrary namespace:

.mylibrary> move.term .square square

  Done.

.mylibrary> find

  1.  square : .base.Nat -> .base.Nat

.mylibrary> move.namespace .tests tests

  Done.

We're using .square to refer to the square definition directly under the root, and then moving it to the relative name square . When you're done shuffling some things around, you can use find with no arguments to view all the definitions under the current namespace:

.mylibrary> find

  1.  tests.square.ex1 : [.base.Test.Result]
  2.  tests.square.prop1 : [.base.Test.Result]
  3.  square : .base.Nat -> .base.Nat

Also notice that we don't need to rerun our tests after this reshuffling. The tests are still cached:

.mylibrary> test

  Cached test results (`help testcache` to learn more)

  ◉ tests.square.ex1       : Passed 1 tests.
  ◉ tests.square.prop1     : Passed 100 tests.

  :white_check_mark: 2 test(s) passing

  Tip:  Use view tests.square.ex1 to view the source of a test.

We get this for free because the test cache is keyed by the hash of the test, not by what the test is called.

:point_up: The use statement can do absolute names as well, for instance use .base.List map .

When you're starting out writing some code, it can be nice to just put it in a temporary namespace, perhaps called temp or scratch . Later, without breaking anything, you can move that namespace or bits and pieces of it elsewhere, using the move.term , move.type , and move.namespace commands.

Modifying existing definitions

Instead of starting a function from scratch, often you just want to slightly modify something that already exists. Here we'll make a change to the implementation of our square function.

Using the edit command

Try doing edit square from your prompt (note you can use tab completion):

.mylibrary> edit square
  :point_up:

  I added these definitions to the top of ~/unisoncode/scratch.u

    square : .base.Nat -> .base.Nat
    square x =
      use .base.Nat *
      x * x

  You can edit them there, then do `update` to replace the definitions currently in this branch.

This copies the pretty-printed definition of square into you scratch file "above the fold". That is, it adds a line starting with --- and puts whatever was already in the file below this line. Unison ignores any file contents below the fold.

Notice that Unison has put the correct type signature on square . The absolute names .base.Nat look a bit funny. We will often do use .base at the top of our file to refer to all the basic functions and types in .base without a fully qualified name.

Let's edit square and instead define square x (just for fun) as the sum of the first x odd numbers (here's a nice geometric illustration of why this gives the same results ):

use .base

square : Nat -> Nat
square x =
  sum (map (x -> x * 2 + 1) (range 0 x))

sum : [Nat] -> Nat
sum = foldl (+) 0
:white_check_mark:

I found and typechecked these definitions in ~/unisoncode/scratch.u. If you do an
`add` or `update` , here's how your codebase would change:

  ⍟ These new definitions will replace existing ones of the same name and are ok to `update`:

    square : .base.Nat -> .base.Nat

Now evaluating any watch expressions (lines starting with `>`)... Ctrl+C cancels.

Adding an updated definition to the codebase

Notice the message says that square is "ok to update ". Let's try that:

.mylibrary> update

  ⍟ I've added these definitions:

    sum : [.base.Nat] -> .base.Nat

  ⍟ I've updated to these definitions:

    square             : .base.Nat -> .base.Nat

Only affected tests are rerun on update

If we rerun the tests, the tests won't be cached this time, since one of their dependencies has actually changed:

.mylibrary> test

  New test results:

  ◉ tests.square.prop1    : Passed 100 tests.
  ◉ tests.square.ex1      : Passed 1 tests.

  :white_check_mark: 2 test(s) passing

  Tip: Use view tests.square.prop1 to view the source of a test.

Notice the message indicates that the tests weren't cached. If we do test again, we'll get the newly cached results.

The dependency tracking for determining whether a test needs rerunning is 100% accurate and is tracked at the level of individual definitions. You'll only rerun a test if one of the individual definitions it depends on has changed.

Publishing code and installing Unison libraries

Before publishing code, you might choose to make a copy of your namespace, similar to how you might tag a release in Git. Let's go ahead and do this:

.mylibrary> cd .
.> copy.namespace mylibrary mylibrary.releases.v1

  Done.

.> cd mylibrary.releases.v1
.mylibrary.releases.v1> find

  1.  tests.square.ex1 : [Result]
  2.  tests.square.prop1 : [Result]
  3.  square : Nat -> Nat

But this is just a naming convention, there's nothing magic happening here.

Now let's publish our mylibrary to a fresh Unison repo. First, create an empty Git repository on GitHub or wherever you prefer to host your Git repositories.

After you've created this empty repo, you can then push to it, using push <giturl> (to push the current namespace) or push <giturl> .mystuff (to push the .mystuff namespace):

.mylibrary.releases.v1> cd .mylibrary
.mylibrary> push [email protected]:<yourgithubuser>/myunisonrepo

You'll see some git logging output. Your code is now live on the internet!

Installing libraries written by others

This section is under construction.

From the root, do:

.> pull [email protected]:<github-username>/myunisonrepo.git .myfirstlibrary

The namespace you created is now available under .myfirstlibrary . Try cd .myfirstlibrary after the pull to look around.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK