1

Rust Notebooks with Evcxr

 3 years ago
source link: https://blog.abor.dev/p/evcxr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Rust Notebooks with Evcxr

Why code notebooks are better than the shell, and an interview with David Lattimore, the creator of the Rust Jupyter kernel

Sometimes things are invented in the wrong order. There’s no technological limitation or breakthrough that precludes a particular invention, it just so happens by historical accident that an alternate technology is invented and becomes widespread before a better idea is struck upon.

For example, marine chronometers are a very complicated and hard to engineer way to keep time on the ocean1 used in the 18th and 19th centuries. In the 20th century, ships switched to using radio time signals. Radio signals are very easy to generate, and quartz clocks are a very accurate way to keep time. Both are much easier to build than a marine chronometer and they’re more accurate and robust to failure. We just… didn’t think of it first2.

Something like this happened with code notebooks3. REPLs (including shells like Bash) were invented and got popular with the rise of Unix and scripting languages. But the basic idea of a code notebook could have been invented any time. The core technology needed for a code notebook is the same as for a shell: you need a way to execute code in the language, maintain a context, and display the output.

As a consequence of inventing code notebooks embarrassingly late, it feels natural to describe them as “like a REPL, but with extra features like displaying images”. Instead we should be describing REPLs, including shells, as “kind of like a notebook, but worse and harder to use”. It’s hard to look at someone spamming ⬆️ and Ctrl+R in a shell and not think they would have benefited from using a code notebook where you can simply edit previous cells.

Notebooks are usually associated with data science work, but this is only because the shell doesn’t provide good visualization capabilities4. Developers on *nix systems learn to use the shell as an interface to all kinds of system tasks, so they tend to be comfortable with the command line. But data scientists generally don’t get this extensive shell indoctrination and Jupyter is readily available and more than capable. We can take from the fact that data scientists are not clamoring to do data science from the terminal as some evidence that, in fact, code notebooks are the more general solution.

And as long as we’re replacing the shell with a notebook, we might as well replace the language of the shell with Rust.

Quick overview of Evcxr

Evcxr is a Jupyter kernel for Rust. It makes iterative, one-off, and task-oriented coding much more enjoyable in Rust. It’s not that you can’t do iterative duct-tape-and-hairpin coding with cargo and an IDE, it’s just that it’s so much more ceremony.

With Evcxr, you can add dependencies from crates.io, and use them immediately:

This interweaving of what would normally go into Cargo.toml with the source code itself is exactly the kind of thing that’s great for iteratively solving a problem, but which most tooling avoids because it’s not very maintainable. That’s ok though: notebooks are not for maintaining. They’re a place to solve a particular problem, and if you’re feeling especially motivated and energetic, a place to document what you learned.

You can also do async and ? without any ceremony in a cell. It will wrap your code with the necessary invocations before running it:

Evcxr is really promising technology, but it has a few limitations. While async works, it looks like rustc made some changes to dynamic libraries that don’t interact well with Evcxr, so async is currently stuck on pre-1.0 tokio. The inline type checking was a little hit or miss for me as well. Sometimes it triggered, sometimes it didn’t. This may be a Jupyter issue. I would very much like to get autocomplete in cells.

Long term, because Evcxr is using rust-analyzer internally, it’s poised to benefit from the autocompletion and code fixes from that project. One of the reasons common shell commands have very short names (e.g. ls, mv, git) is because when you’re solving something, typing speed is really important. Good autocomplete mostly eliminates that advantage, and in return hands us back the ability to use more than three letters in our function names.

Overall, I’m very excited about what Evcxr unlocks.

Interview with David Lattimore

I chatted briefly with David, the creator of Evcxr who was kind enough to answer my questions. We talk about the internals of Evcxr, features he’d like to add, some of his work on structural search and replace in rust-analyzer, and what he’s excited about in the Rust ecosystem.

The following is a transcript of our conversation edited for length and clarity.

So I guess the first question I have is: how do you pronounce the project name?

I pronounce it “e-vix-er”. Whenever Evcxr is brought up on Reddit or something like that, someone will be like “It's a terrible name!” or like “I can never remember the name!”. It's quite searchable, which I guess is part of the motivation. You're not going to confuse it for something else or get results for something else.

But unfortunately, it's very difficult to remember and very difficult to work out how to pronounce.

Could you talk a little bit about what you worked on previous to Evcxr, and what kind of led you up to creating the project?

At the time I'd started it, I'd been playing with writing a linker,5 so I'd been playing in that space of loading and dealing with object files and shared objects and those kinds of things. And it just kind of occurred to me that it would be reasonably easy (or I thought it would be reasonably easy) to make a REPL like-thing just by loading shared objects and running the code.

It was reasonably quick to get something basic up and running. As always with these things the devil is always in the detail. It's a continuously evolving project to add the many subtle features and make different things work well.

So it started out as a REPL and you added the Jupyter kernel on afterwards?

Yes, although I added the Jupyter kernel before I released. I wrote the REPL and then I thought, “Well my wife is a data scientist, so she uses a Jupyter notebook quite a bit. That could be kind of fun to try and make a Jupyter kernel as well"

So I spent an extra couple of weeks and make a basic Jupyter kernel before I released.

Had you used Jupyter much previously yourself?

I'd never really used it myself. I'd just seen her using it and I thought it looked kinda cool.

It looks like you switched the project from using syn to using rust-analyzer. Can you talk about how that works?

Yeah sure, I switched to using rust-analyzer about six months ago.

Prior to that I would generate the code, and then ask the Rust compiler to compile it, and then look at what the errors were. I assumed that all the variables were of type String, and I would look at the errors were produced. The Rust compiler can happily format its output as JSON, so you get like a JSON representation of the errors and you just pull out the error message for a particular variable and see it says, you know "found this type, expected that type".

That worked for quite a few years, but then recently they changed the messages so that they no longer included the fully qualified type. So instead of saying, you know, std::collections::HashMap or something like that it would just say HashMap. And without the fully qualified type I couldn't use those types anymore, they just weren't sufficient.

So, I switched to using rust-analyzer. If rust-analyzer can't determine the type, Evcxr will still fall back to using the compiler messages. The only reason I've got that there is because rust-analyzer doesn't deal with fixed-sized array types. Like it doesn't tell you the size of the array in the type that rust-analyzer infers, so if you have like:

let array_of_ints = [0; 5];

rust-analyzer's type will just say:

[i32; _]

I can't use that, so for those particular cases I still fall back to using the compiler error messages to determine the types.

I would like to get rid of that code because it's... it's a bit ugly. It's a bit of a hack. But yeah for the most part I'm using rust-analyzer. I'm also not using syn any more, so I've switched entirely to using rust-analyzer to do the parsing.

How do you find using rust-analyzer as a library? I know it's broken into crates so it's theoretically usable as a library, but you're actually using it in anger, so what's that been like?

It's pretty good. Probably my biggest complaint would be that it's quite big and so it takes a while to compile. Certainly my continuous integration slowed down quite a lot when I pulled in rust-analyzer because now every time it runs it has to build all of rust-analyzer which takes a while.

An alternative way that I could have integrated that I didn't consider at the time was to actually pull in the rust-analyzer binary and talk to it using the language server protocol, instead of using it as a library. Obviously, that would have been better from a compilation time perspective, but I just wasn't sure whether I'd have access to everything I needed. So I didn't end up going that route.

What kind of features do you want to add to Evcxr that you feel are missing?

One feature that I started looking to try and add a while back, but I just haven’t had time recently, is allowing code that runs in a Jupyter notebook to interact with the notebook process. Allowing some way for JavaScript code that runs in the browser, to communicate with some Rust code that runs in a library.

Like maybe in a crate, where you pull in a crate that, you know, creates a graph or something like that, and that code then talks to the JavaScript code running in the browser. So you can have bidirectional communication going backwards and forwards allowing an interactive thing to be displayed in the notebook.

Like a slider widget, something like that?

Exactly, yep. So I started working on that I think I got some of the basic stuff working. I think I, I can't remember where I was up to with it but I remember I had issues with Mac. Like the changes I'd made when I pushed them, the tests failed on Mac.

It's probably because I didn't have access to a Mac, so the only way I can test these changes is like: put some print statements in, push to CI, and then wait for Travis to run Mac, which takes like half an hour or an hour. And then, you know, see what the values the print statements are. It can make a really frustrating and slow testing experience.

I should probably put a call out just to ask if anyone with a Mac wants to come and help. It would make things much easier!

It's currently pre 1.0, what are your criteria for making a 1.0 release?

You know, I haven't really given too much thought to 1.0. I guess, so, Evcxr itself is a library, which is then used by the Jupyter notebook and the REPL. And I make breaking changes to the API of that library moderately often. Just because it's convenient, and because the only two users of the library are in the same repository.

So I guess I'd need to think about how I want to treat it with regards to semver there. Maybe if I just kept the library pre-1.0, but made the notebook and the kernel be 1.0, then I could get away with it. Because I'd still like to be able to make breaking changes to the library.

The API of the library is very not stable.

It looks like after Papyrus and Rusti stopped being maintained, Evcxr seems to be the only Rust REPL that has any recent commits to it. How do you feel about being the sole Rust REPL at this point?

[It turns out I was wrong about this. After our conversation David pointed out IRust, which is a new Rust REPL that’s being actively developed]

At time I wrote Evcxr, there was only one other REPL, which was rusti. I think at the time it didn't say it wasn't maintained, but it was running on an old version of nightly from 2016. I think the issue there was that they were using some compiler internals, and those compiler internals got removed, so there was no way they could move forward to allow a new version.

I think it was using some compiler internals to get access to the LLVM IR or something like that. And then they'll be using LLVM to JIT it or something like that. It worked completely different to the way Evcxr works.

But there was no equivalent API provided, because the compiler just stopped... doing that.

Is this your main Rust project?

The original Rust project that I worked on and released was a thing called Rerast. Which was a structural search and replace tool. So it searched for Rust code, and replaced it based on a parse tree pattern. It used a lot of compiler internals and was nightly only. It was a bit of a pain to maintain and had a relatively low number of users.

But it broke very often, because the compiler internals were always changing. Because I depended on almost the entire surface area of the AST. Any time they changed the AST, my builds would break and I had to go fix it. So we had like fifty build fixes in one year, so round about one per week.

So eventually, once rust-analyzer got good enough, I helped out there. There was already some support for structural search and replace in rust-analyzer, and I built that up to close enough to feature parity to what I had in Rerast, and then deprecated Rerast.

So you're working on the structural search and replace in rust-analyzer itself?

Yeah I've done a lot of work on that. There's a lot more I'd like to do there as well, but yeah, I haven't really had time lately.

What I'd really like to see there would be some way for library authors to be able to make a change like they deprecate some API, and they put some annotations on the API that they're deprecating to provide a structural search and replace rule to allow you to automatically transition to the new API.

Then have the IDE automatically suggest that for you. Kind of how the compiler can make suggestions and the IDE can apply them. But this would be something driven by the library authors, not by the compiler team. So I'd like to see something like that, but I really haven't had time to drive that.

I'm curious if you have any places where you think Rust is particularly poorly applicable. Areas where Rust is maybe never going to be a good solution. Do you have any opinions on that?

I don't think it's a super hard language to learn. But I don't think it would be suitable as like a first language for kids, for example. For them you really want something that's simpler. I don't know whether Python is the right thing for kids, or scratch or something like that.

I'm not about to teach my kids Rust!

I guess in terms of industries, I think it probably could work pretty well in most cases. Certain kinds of problems can be a little annoying to try and do in Rust: graphs and things like that, that can be a bit annoying. Although there are ways to manage it, you've just got to structure your code a bit differently.

Often the ways you structure it end up being quite nice. So maybe it's not an issue. I found when I've done graphs in the past, just assigning integer ids to things, and things work out pretty well. Don't use pointers.

Nothing in particular comes to mind.

What are you most excited about in terms of Rust?

I've recently become quite interested in embedded programming in Rust. Mostly I'm excited to see Rust growing. And just seeing it continue to become bigger and better and more ergonomic and better tools and better libraries. Because that really does start to fill in the gaps where maybe it was a good language, but didn't have good libraries, but once you fill in the libraries then, there's really no reason not to use it.

I don’t know much about the embedded space. Is that growing a lot now in terms of usage?

I think it's a lot better than it was a few years ago. Like, I remember someone saying four or five years ago that there was pretty much one person working on embedded Rust. It probably wasn't true, but like there was one person really driving it. Whereas now, it feels like there's a lot of people you can point to where they're very clearly big in the embedded space. And you've got companies like Ferrous Systems doing great work in the embedded space.

It feels like there's quite a few more people working and although there's still bits and pieces missing it's a pretty nice experience and works reasonably well. At least depending on... It depends on what you're trying to use and what you're trying to do with it.

I don't know if you've seen Knurling. They've been doing a lot of Knurling sessions where they'll announce they're going to build, like, a CO2 sensor or something like that. They give you the part list of what you have to buy, and you order those online somewhere and then as the session goes on they release information about how to do different things with it. You then build this sensor and program it.

They don't provide all the details. I assume they provide code samples, but like the idea is that you write the code. So you're learning how to do it. There's other people who are learning as well, so if you're having troubles you can talk to other people to see what they've done.

They also make a bunch of tools, so there's a bunch of crates and utilities they've released to make it easier to do embedded Rust development. Things like probe-run where you can just type cargo run and it will compile your code and then run it on the device. And any log statements that you have in your code are on the device and just show up in your terminal. So it's like you're running it on the local machine but in actual fact it's running on a device you've got plugged in over USB.

If you Ctrl+C you get a stack dump of where it was on the device. It's quite a nice development experience in that regard. They've helped build out some of that stuff along with others in the community.

Thanks for taking the time to talk with me, is there anything you want to plug?

I need to figure out how to get Evcxr showing up in the first results when you search for “rust repl”. It’s at least on the first page, but that’s still not great given it’s been around for a few years now. I don’t know how to fix that kind of thing.

Leave a comment

Upcoming posts

I’ve got an interview with a generative artist that creates his works primarily in Rust, along with the primary developer of Seed, a frontend Rust web framework. Those should be out in the next couple of weeks.

Subscribe

1

With accurate timekeeping you can determine your ship’s longitude.

2

Many more examples like this can be found in Ryan North’s book How to Invent Everything

3

I’m calling them generically “code notebooks” here to indicate the general concept pioneered by Mathematica and popularized by Jupyter. But if you want to mentally translate that to “Jupyter notebook” in your head, feel free.

4

Various attempts at displaying images in a terminal emulator are out there. I personally use Kitty myself, which supports images, but it’s nothing like a standard or something that people are seriously targeting as a visualization platform. It’s cool though, don’t get me wrong.

5

Evcxr works by compiling each cell to an .so file and having the main notebook process dynamically load them and run the contained function.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK