22

Designing a functional programming language: Yatta

 3 years ago
source link: https://functional.blog/2020/05/25/designing-a-functional-programming-language-yatta/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This is the story of the Yatta Language. Yatta is a functional, dynamic, non-blocking language for the GraalVM . Today is the day of its first public alpha release.

I’m sure the first question many readers will ask is “why yet another programming language?”. At risk of sounding arrogant, my answer is as simple as: I wanted a language like Yatta, one that allows me to write programs as simple expressions, without having to worry about low-level stuff such as concurrency, with a beautiful syntax and a strong but dynamic type system . I didn’t want to compromise with anything less than that.

This post is written in an informal style, focusing on the history and motivation behind this project. It is also written from my perspective, but I want to emphasize the contributions of Fedor , specifically to the area of the runtime and data structures and, just as importantly, countless hours of discussion about the language features in general.

Let me jump in here and give you a sneak peak of where we want to get in this blog. Hoping to capture your attention before dwelling into what’s wrong with all the other languages.

Take a look at the following program in Yatta:

try
    let
        keys_file = File::open "tests/Keys.txt" {:read}
        values_file = File::open "tests/Values.txt" {:read}

        keys = File::read_lines keys_file
        values = File::read_lines values_file

        () = File::close keys_file
        () = File::close values_file
    in
        Seq::zip keys values |> Dict::from_seq
catch
    (:ioerror, _, _) -> {}
end |> println

Don’t worry about not understanding all of the syntax, that’s not important at the moment. What is happening here is this: the program reads two files (one containing keys, the other values) in parallel, and then zip the keys/values into pairs creating a dictionary.

It is probably not clear why those files are being read in parallel. And that is the whole point of Yatta!

If I have your attention, please read on. I will get into the details in a bit, but first I’d like to share some wider context and my motivations that lead me to creating this language.

I found functional programming concepts very useful since when I started learning about them. Working with Scala and Erlang in the past has been a pleasure and, even though they are quite different in their philosophy, they share some nice features such as first-class functions , immutable data structures or powerful pattern matching . In the following text, I don’t mean to criticize any existing language, but I do want to point out some specific points that I either found useful or difficult for some reason. This should provide enough context to why Yatta is designed the way it is.

Scala was the first functional programming language I learned and actually used in a production setting. I was primarily a Java engineer before so it felt like a natural way to get into the FP world. My main expectation from Scala was to be able to easily parallelize some code, which was pretty painful at the time in Java (it still is, at the language level at least).

Note here that, at that point, I was still very much interested in how to add parallelism to the code. Yatta is specifically designed in a way that I no longer have to think about that at all!

I was initially very happy with Scala – being able to reuse most of my existing Java code was great. Being able to slowly move away from OOP concepts to a FP style was nice. At some point later I realized that I am writing most of my code in a functional style, without any inheritance, but with pattern matching, immutable case classes, and higher order functions.

Case classes are great for storing data and there is no more need to hassle with DTOs. Having immutable types was hugely beneficial when reasoning about concurrency. All good things. After some time, though, I started to struggle. I did not have a slow computer but, as my project grew, compile times grew with it. Incremental compilation helped but only to a degree. I was at a point where I was using quite a bit of various Scala libraries and sometimes I would spend most of my day fighting complicated type errors, aggravated by slow compilation times.

Sure, the compiler did catch some errors, but there were times when I was really wondering if the couple of errors caught were really worth the amount of time wasted waiting for compilation to finish or fight hard-to-understand type errors. Complex abstractions built into large libraries and frameworks pushing the compiler to its limits usually were to blame here.

Eventually I ended up working on a project in Erlang. Having no previous experience working with it, I was not sure what to expect. Erlang is dynamically typed after all. It also does not support “traditional” OOP concepts found in Java and alike. It has a “weird” syntax and an unusual concurrency model as well. Lots to learn.

I quickly fell in love with Erlang. The things I initially feared most (such as the lack of OOP features) turned out to be those which allowed for a simpler syntax. Plus, it has a built-in concurrency model, rather than the basic lock/monitor based synchronization found in Java. Yet, my biggest fear was the fact that Erlang is a dynamic language. I always thought I could never like a dynamic language. My mindset was still very strongly in favor of static typing, despite all the painful experiences dealing with unclear type errors and slow compile times I had.

Since then I have worked some more with Java, Scala, JavaScript, TypeScript, Python, but I felt like they all had something that shouldn’t have been there: none of these were functional-only languages, they all still implemented OOP features. I never, since I first used Erlang, felt like I needed OOP (in its mainstream form at least, with inheritance). Every language which supports OOP now feels bloated to me. I do not want to deal with it any more. I eventually changed my mind about static typing too. I do not want to fall in the trap of slow compilations and debugging mysterious typing errors again.

The only reason I had to not simply go back to Erlang was actually its concurrency model. After a while, sending messages between processes started to feel somewhat low-level. I started to wonder whether this level of abstraction needed to be present for most of the user-level code, or whether it could be hidden into the language “itself”.

When GraalVM appeared about 2 years ago (or at least when it caught my attention), I immediately started planning on how to use it to create a programming language I would like to use, without the compromises and limitations of existing languages. That meant a functional language in the first place. Dynamic. Dynamic, because I feel like there are so many static languages these days and it seems like no-one really knows which type system is “the best”. Every static language comes with a little different approach, but which one is the “right” one? Haskell with its dozens of extensions? Scala with its Turing complete type system? Maybe, I surely am not advocating against these languages, very successful in their domains. Could Idris finally be getting it right?

I, however, do not want to lock myself into constraints of a particular type system and then subject every other decision to those self-imposed limits. I want to have the flexibility of a dynamic type system, and then do my best to avoid errors at different levels of abstraction other than the type system. “Surely, this comes at a run-time cost!”, you might think – and it is true. But this is where GraalVM comes to help, it provides Yatta with JIT and the ability to optimize the code based on the run-time profile, so the actual cost will hopefully not be that bad.

So here is what I actually wanted from Yatta:

  • Plain and simple syntax , minimal boilerplate, few keywords, no need for OOP constructs. The code must be very easy to read through.
  • Everything is an expression , no statements. The program itself is just a single expression.
  • First-class functions and modules . Modules are simply sets of functions and records. Modules can be further organized into packages. Function calls supporting tail-call optimizations.
  • Few data types . Yatta comes with the following built-in types: integer (64 bit), floating point number (64 bit), character (UTF-8), symbol, sequence (variable length, constant-time access to both ends), tuple (fixed length), dictionary, set and STM (Software Transactional Memory – coming in beta). No implicit conversions between types though!
  • Custom data types , or records . Records are basically just tuples with named fields, and syntax sugar that goes with it.
  • Powerful pattern-matching , allowing nested matches and non-linear patterns (ability to use the same name in pattern more than once).
  • Dead-simple concurrency model . Even if it won’t be the most performing one for some (or most) situations, it should be built into the language and be very easy to use. Ideally built with functions alone, requiring no special syntactic constructs. Advanced concurrency possible via STM.
  • Run on JVM (GraalVM) and allow polyglot cooperation with other JVM languages. Very important if integrating with existing code-bases.

Surely this set of priorities is very personal and comes from my previous experiences with other languages. Other people may have different priorities. Either way, I quickly started working on the implementation of this language. In the next paragraphs, I will try to dive a little deeper into each of them.

Syntax

I knew I wanted a simple syntax, but what does that mean? Lisp syntax is absolutely minimalistic (technically), yet somehow very unpleasant to look at. Some people, including me, consider Python’s syntax to be very simple. It is one of Python’s main goals after all. But I wondered, is it really the “simplest” possible syntax? Haskell’s syntax is arguably more minimalistic, but again, is it really simple? I do not think so. Haskell allows custom operators composed of all kinds of what looks like random characters (at first sight) and I would argue Haskell programs are some of the most difficult to read if you do not use the language daily. Therefore, what does simple really mean to me, is this:

  • Minimum of keywords. There are as of now only around 20 of them. Specifically: let , in , if , then , else , true , false , module , exports , as , case , of , import , from , end , do , try , catch , raise , module , record . This list might slightly change in the future, but the aim is clear. Have as few as possible.
  • Few types of expressions. No statements at all . This reduces the mental effort required to understand what is going on. No need to think about monad transformations or having X ways to define type-specific behaviors (type classes, inheritance, macros, implicits, …). Enough said, Yatta has these types of expressions: case / if / let / do / try + catch / raise , and then literal expressions, such as numbers or strings, or functions and modules. That is all there is as of now, and I would like to keep the language really as small as possible.
  • Simple style is the key and less is more, meaning no semicolons, no braces, not even any “blocks”.
  • Pipeline syntax – for easy chaining of function calls. Supported operators |> and <| redirecting result of a function to either right or left side.
  • Generators syntax as a syntax sugar around Transducers (generic transformations of reducing function), allowing iterating and constructing data structures such as sequence, set or dictionary with ease .

Considering these requirements, Yatta’s syntax is actually really small and easy to read, I believe. (I suggest you skim through that page, I tried to make it really concise yet comprehensive description of all of the syntax and all types of expressions, including examples).

Semantics

At this point I can finally circle back to what I started in the beginning and explain what is this all about. And I believe, this is the main point where Yatta really stands out. It’s not that the ideas implemented in Yatta runtime would be anything unheard of before, but their implementation on the runtime level makes this language very easy and intuitive to work with.

The key point here is this: Yatta essentially removes any difference between already computed values and Promises (or Futures in Java/Scala). What that means is that the programmer doesn’t have to write any different code depending on whether the value is already there or will be computed later.

Consider this, you want to read two files (one containing keys, the other values) in parallel in Scala, and then zip the keys/values into pairs creating a dictionary. A typical way to do that in Scala would be something like:

import scala.concurrent.future
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.io.Source
import java.io.FileNotFoundException

object Main extends App {
  def readFile(fileName: String) = {
    io.Source.fromFile(fileName).getLines.toList
  }
  
  val data = for {
    keys <- future { readFile("keys.txt") }
    values <- future { readFile("values.txt") }
  } yield keys.zip(values).toMap
  
  data.recover {
    case e: FileNotFoundException => {
      Map[String, String]()
    }
  }.onSuccess { 
    case map => {
      println(map)
    }
  }  
  
  Await.result(data, Duration.Inf)
}

Sure, there might be libraries that could make this shorter and more elegant, but for basic Scala, this is probably the way to go. Now how about doing that in Yatta:

try
    let
        keys_file = File::open "tests/Keys.txt" {:read}
        values_file = File::open "tests/Values.txt" {:read}

        keys = File::read_lines keys_file
        values = File::read_lines values_file

        () = File::close keys_file
        () = File::close values_file
    in
        Seq::zip keys values |> Dict::from_seq
catch
    (:ioerror, _, _) -> {}
end |> println

Both of these examples show full programs, including all the “boiler-plate”. As you can see, there is very little of it in Yatta. More importantly, the code in Yatta is also non-blocking in nature, and both files are read concurrently, without ever having to write a single extra character to make that happen. What is important is what you want to do with these files, not how to make it happen.

How does Yatta do this? A couple of things: first, it knows the difference between the do and let expressions. They both are used to evaluate multiple steps of computation, however do ensures that the steps take place in the same sequence as they are defined, let tries to parallelize non-blocking tasks.

So what’s going on here is this: Yatta will first perform a static analysis of the let expression in order to determine dependencies between the steps . It knows that keys_file and values_file are used in the File::read_lines function and also knows that keys and values do not depend on each other, and so they may be ran concurrently . Also know this, Yatta doesn’t parallelize keys_file and values_file , nor does it parallelize closing files – even though it seemingly should – there are no dependencies between these lines either. The trick is that File::read_lines is a function that returns a runtime level Promise (hidden from the user), and that means that reading two files in parallel is actually fine , since you’d have to block on it otherwise, so why not block on both being read at once.

Another important concept here is that the order of aliases defined in the let expression does matter . Yatta doesn’t just randomly re-arrange them based on dependencies alone. If it did, it could for example close those files before they are ever read. That would be incorrect. Yatta just uses static analysis of this expression to determine which aliases can be “batched” and actually batches the execution if they provide underlying Promises. Then the whole expression is transformed into something like this:

Execute batch 1 (sequentially, since File::open does not return a Promise):
        keys_file = File::open "tests/Keys.txt" {:read}
        values_file = File::open "tests/Values.txt" {:read}

Execute batch 2 (in parallel, since File::read_lines does return a Promise):
        keys = File::read_lines keys_file
        values = File::read_lines values_file

Result from batch 2 is a Promise aggregating both keys and values, which when complete, executes batch 3:
        () = File::close keys_file
        () = File::close values_file

Finally, the whole let expression is now a Promise, so run the final expression whenever it is ready:
        Seq::zip keys values |> Dict::from_seq |> println

Yatta automatically chains runtime Promises as needed and, more than that, it “unwraps” them whenever they are complete, so the runtime doesn’t actually get bloated with propagating Promises all over the place. As soon as a Promise is computed, it becomes just a regular value again.

Note that catching exception here for an expression returning a Promise doesn’t differ from catching exceptions for regular values. This is because Yatta completely and transparently hides the difference between the two from the user, in all cases.

Runtime

Yatta contains some of the state-of-art data structures built by Fedor . This include sequence, set, dictionary and STM. I think this section deserves its own post, as Fedor has put tremendous effort into optimizing these data structures and they offer not only excellent performance, but also some interesting features, for example that a sequence of UTF-8 characters or bytes will automatically keep them encoded in chunks under the hood, thus greatly decreasing memory usage and garbage collector pressure while still providing rich and flexible functional API!

What’s next and why should you give Yatta a try

Yatta, in my opinion, really pushes the expectations from dynamic languages. It provides a level of abstraction that makes programming just about expressions. It removes the need from programming of “how” and puts the focus back on “what”. As common as this sounds, for all declarative languages, I believe Yatta goes further in that way than most.

This blog post is by no means comprehensive. There are many more features not even mentioned here, so please do make sure to read the docs .

If you’re curious, read the Yatta homepage , set up a local GraalVM and install Yatta to play with. The installation is dead simple – just one command after you’ve set up GraalVM.

I suggest, clone the project repo, and start by playing with programs in language/tests. Just run:

yatta language/tests/DistributedPiCalculation.yatta

This alpha release is the first public release, with most of the syntax and semantics in place. The standard library is still pretty small and the focus of the next work will be to expand it significantly. The GitHub project hosts the issue tracker and there is a Google group for general discussions about the language, as well as a Gitter room for a quick chat with me, Fedor or any interested users.

What to expect from the next release:

Feel free to suggest what the next blog post should be about. I will keep writing them about new features or improvements, as well as maintain the release notes updated.

Show support by starring a GitHub project , subscribing and sharing this blog, or by contributing your feedback, ideas, your favorite editor/IDE integration, libraries, or whatever else you think might help!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK