A stack-less Rust coroutine library under 100 LoC

As of stable Rust 1.39.0, it is possible to implement a very basic and safe coroutine library using Rust's async / await support, and in under 100 lines of code. The implementation depends solely on std and is stack-less (meaning, not depending on an separate CPU architecture stack).

A very basic simple coroutine library contains only an event-less 'yield' primitive, which stops execution of the current coroutine so that other coroutines can run. This is the kind of library that I chose to demonstrate in this post to provide the most concise example.

Yielder

To the coroutine we pass a Fib struct that only contains a simple binary state. This Fib struct has a waiter method that creates a Future that the coroutine can use in order to be await ed upon.

use std::future::Future;
use std::pin::Pin;
use std::task::{Poll, Context};

enum State {
    Halted,
    Running,
}

struct Fib {
    state: State,
}

impl Fib {
    fn waiter<'a>(&'a mut self) -> Waiter<'a> {
        Waiter { fib: self }
    }
}

struct Waiter<'a> {
    fib: &'a mut Fib,
}

impl<'a> Future for Waiter<'a> {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        match self.fib.state {
            State::Halted => {
                self.fib.state = State::Running;
                Poll::Ready(())
            }
            State::Running => {
                self.fib.state = State::Halted;
                Poll::Pending
            }
        }
    }
}

Executor

Our executor keeps a vector of uncompleted futures, where the state of each future is located on the heap. As a very basic executor, it only supports adding futures to it before actual execution takes place and not afterward. The push method adds a closure to the list of futures, and the run method performs interlaced execution of futures until all of them complete.

use std::collections::VecDeque;

struct Executor {
    fibs: VecDeque<Pin<Box<dyn Future<Output=()>>>>,
}

impl Executor {
    fn new() -> Self {
        Executor {
            fibs: VecDeque::new(),
        }
    }

    fn push<C, F>(&mut self, closure: C)
    where
        F: Future<Output=()> + 'static,
        C: FnOnce(Fib) -> F,
    {
        let fib = Fib { state: State::Running };
        self.fibs.push_back(Box::pin(closure(fib)));
    }

    fn run(&mut self) {
        let waker = waker::create();
        let mut context = Context::from_waker(&waker);

        while let Some(mut fib) = self.fibs.pop_front() {
            match fib.as_mut().poll(&mut context) {
                Poll::Pending => {
                    self.fibs.push_back(fib);
                },
                Poll::Ready(()) => {},
            }
        }
    }
}

Null Waker

For the executor implementation above, we need a null Waker , similar to the one used in genawaiter ( link ).

use std::task::{RawWaker, RawWakerVTable, Waker},

pub fn create() -> Waker {
    // Safety: The waker points to a vtable with functions that do nothing. Doing
    // nothing is memory-safe.
    unsafe { Waker::from_raw(RAW_WAKER) }
}

const RAW_WAKER: RawWaker = RawWaker::new(std::ptr::null(), &VTABLE);
const VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);

unsafe fn clone(_: *const ()) -> RawWaker { RAW_WAKER }
unsafe fn wake(_: *const ()) { }
unsafe fn wake_by_ref(_: *const ()) { }
unsafe fn drop(_: *const ()) { }

Giving it a go

We can test the library using a program such as the following:

pub fn main() {
    let mut exec = Executor::new();

    for instance in 1..=3 {
        exec.push(move |mut fib| async move {
            println!("{} A", instance);
            fib.waiter().await;
            println!("{} B", instance);
            fib.waiter().await;
            println!("{} C", instance);
            fib.waiter().await;
            println!("{} D", instance);
        });
    }

    println!("Running");
    exec.run();
    println!("Done");
}

The output is as follows:

Running
1 A
2 A
3 A
1 B
2 B
3 B
1 C
2 C
3 C
1 D
2 D
3 D
Done

Performance

Timing the following program compiled with lto = true , I have seen that it takes about 5 nanoseconds for each iteration of the internal loop, on an Intel i7-7820X CPU.

pub fn bench() {
    let mut exec = Executor::new();

    for _ in 1..=2 {
        exec.push(move |mut fib| async move {
            for _ in 0..100_000_000 {
                fib.waiter().await;
            }
        });
    }

    println!("Running");
    exec.run();
    println!("Done");
}

End notes

One of the nice things about the async / await support in the Rust compiler is that it does not depend on any specific run-time. Thus, if you commit to certain run-time, you are free to implement your own executor.

Independency of run-time has its downsides. For example, the library presented here is not compatible with other run-times such as async-std . And in fact, the implementation violates the interface intended for the Future 's poll function by assuming that the Future will always be Ready after it was Pending .

Combined uses of several run-times in a single program is possible but requires extra care (see Reddit discussion ).

Yielder

Executor

Null Waker

Giving it a go

Performance

End notes

Recommend

ytop: a TUI system monitor written in Rust

Is there any free open source code for mobile apps like Uber?

Build your own block_on()

Gandi.net postmortem of the failure of one hosting storage unit on January 8

作为一个商人，口罩钱到底该不该挣

数论基础代码合集

Implementing a Turn-Based Game in an Entity Component System with SPECS-Task

faux is a traitless Rust mocking framework for creating mock objects out of user...

TIL: CSS Grid can be used to stack elements

Sonatype Disables Unencrypted Access to Maven

About Joyk