2

Porting Rust's std to rustix

 10 months ago
source link: https://blog.sunfishcode.online/port-std-to-rustix/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Porting Rust's std to rustix

Posted on January 04, 2022

Rustix is a system-call wrapper library with multiple backends. It has a raw Linux syscalls backend, as well as a libc backend, and other backends are in development. Rustix is designed for memory safety, I/O safety, and performance.

And this is a branch of Rust's std partially ported to use rustix in place of direct libc calls. Read on for why this is cool, and stay for the benchmarks!

Factoring out unsafe, error handling, and raw pointers

The first reason that porting std to rustix is cool is that rustix factors out a lot of unsafe blocks from std. Talking to the OS still requires unsafe, but with rustix the unsafe blocks are focused on individual syscalls. That way, the unsafe blocks that remain in std are the interesting ones, where std itself is doing something interesting that needs unsafe.

Rustix also provides idiomatic Result error handling for system calls. And it uses Rust references and slices instead of raw pointers. These make it easier to read std's code and focus on the important semantics of the system calls, without the distractions of libc API mechanics.

And, rustix simplifies some minor infelicities in syscall APIs related to C integer type sizes.

Putting these all together in an example, this code:

   let len = cmp::min(buf.len(), <wrlen_t>::MAX as usize) as wrlen_t;
   let ret = cvt(unsafe {
       c::send(self.inner.as_raw(), buf.as_ptr() as *const c_void, len, MSG_NOSIGNAL)
   })?;

becomes this:

   let ret = rustix::net::send(&self.inner, buf, SendFlags::NOSIGNAL)?;

This puts the focus on the send operation, without the distractions of unsafe, raw pointers, wrlen_t types, and cvt error handling.

A path to a Rust on Linux without libc

A second reason this is cool is that it's a step towards a Rust toolchain on Linux that doesn't depend on libc.

Rustix is able to make direct Linux syscalls from Rust code. And origin is a Rust library which is able to startup and shutdown processes and threads (comparable to crt1.o and libpthread).

With these, we have all the things needed to run Rust programs on Linux. And it turns out there are two different ways to do this. The first way is mustang.

Mustang is uses a library called c-scape, which wraps rustix with libc-compatible APIs, allowing std to use rustix without modifications, including threading support. This has gotten a lot of functionality up and running; mustang can run a lot of real-world code now. And mustang helps test rustix and origin. And beyond that, the c-scape libc compatibility layer has several additional uses. But, mustang in its currrent form looks like it would be awkward to upstream into Rust.

Fortunately, mustang's architecture of keeping c-scape as a separate layer on top of rustix and origin, with rustix and origin providing idiomatic Rust APIs, means that another way is possible as well. This blog post is about starting to port std to rustix directly. In addition to not using libc code, this path doesn't use libc APIs either.

Performance

And another reason this is cool is that rustix-enabled std also brings several modest speedups, compared to std in upstream Rust. On machines I've tested it on:

  • std::fs::metadata is about 3% faster
  • std::fs::File::open is about 5% faster
  • std::fs::read_to_string is about 3% faster
  • std::time::Instant::now ranges from 1% to 10% faster on Linux

See the benchmark source code to see what's being measured.

Rustix makes metadata, open, and other filesystem path operations faster by avoiding a dynamic allocation when converting from Rust strings into NUL-terminated C strings. Rustix uses stack-allocated memory for strings up to a reasonable size, which is much faster.

Rustix makes Instant::new faster on Linux by simplifying error handling, since we know that reading the system monotonic clock never fails (and as in current std, it still does panic if a failure ever does occur).

And rustix contains a number of other optimizations, such as using inline system calls and avoiding the TLS errno variable on Linux, but so far benchmarks confirm the common wisdom that these aren't usually very significant compared to the cost of the actual syscalls.

Looking forward

A goal for this port of std to rustix is to eventually propose it to be merged into upstream Rust. I'm hoping this blog post will start some conversations about what this should eventually look like.

With rustix's libc backend, std can continue to support all the libc-using platforms that Rust currently supports. And with either backend, rustix brings the advantages of factoring out unsafe, error handling, and raw pointers, and its optimizations for converting to C-style strings.

This project promotes several other goals as well, such as promoting I/O safety concepts and APIs, helping test some of the infrastructure used by cap-std, and helping set the stage for future projects related to sandboxing, WASI, nameless, and other areas.

Thanks!

Thanks to @nivkner for implementing support for child processes, to @Urgau for adding arm support and implementing several features in rustix, @cole-miller for implementing getcwd and chdir in mustang, and @jplatte and @ratmice and others for contributing useful patches, and @tshepang in particular for bringing up ripgrep as a testcase (and it almost works now!).


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK