Fibers: the Most Elegant Windows API
source link: https://www.tuicool.com/articles/hit/ymUrIfA
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
The Windows API — a.k.a. Win32 — is notorious for being clunky, ugly, and lacking good taste. Microsoft has done a pretty commendable job with backwards compatibility, but the trade-off is that the API is filled to the brim with historical cruft. Every hasty, poor design over the decades is carried forward forever, and, in many cases, even built upon, which essentially doubles down on past mistakes. POSIX certainly has its own ugly corners, but those are the exceptions. In the Windows API, elegance is the exception.
That’s why, when I recently revisited the Fibers API , I was pleasantly surprised. It’s one of the exceptions — much cleaner than the optional, deprecated, and now obsoletePOSIX equivalent. It’s not quite and apples-to-apples comparison since the POSIX version is slightly more powerful, and more complicated as a result. I’ll cover the difference in this article.
For the last part of this article, I’ll demonstrate a async/await framework build on top of fibers. The framework allows coroutines in C programs to await on arbitrary kernel objects.
Fibers
Windows fibers are really just stackful
, symmetric coroutines.
From a different point of view: they’re cooperatively scheduled threads,
which is the source of the analogous name, fibers
. They’re symmetric
because all fibers are equal, and no fiber is the “main” fiber. If any
fiber returns from its start routine, the program exits. (Older versions
of Wine will crash when this happens, but it was recently fixed.) It’s
equivalent to the process’ main thread returning from main()
. The
initial fiber is free to create a second fiber, yield to it, then the
second fiber destroys the first.
I’m going to focus on the core set of fiber functions. There are some additional capabilities I’m going to ignore, including support for fiber local storage . The important functions are just these five:
void *CreateFiber(size_t stack_size, void (*proc)(void *), void *arg); void SwitchToFiber(void *fiber); bool ConvertFiberToThread(void); bool *ConvertThreadToFiber(void *arg); void DeleteFiber(void *fiber);
To emphasize its simplicity, I’ve shown them here with more standard
prototypes than seen in their formal documentation. That documentation
uses the clunky Windows API typedefs still burdened with its 16-bit
heritage — e.g. LPVOID
being a “long pointer” from the segmented memory
of the 8086:
Fibers are represented using opaque, void pointers. Maybe that’s a little too
simple since it’s easy to misuse in C, but I like it. The return
values for CreateFiber()
and ConvertThreadToFiber()
are void pointers
since these both create fibers.
The fiber start routine returns nothing and takes a void “user pointer”.
That’s nearly what I’d expect, except that it would probably make more
sense for a fiber to return int
, which ismore in line with main
/ WinMain
/ mainCRTStartup
/ WinMainCRTStartup
. As I said,
when any fiber returns from its start routine, it’s like returning from
the main function, so it should probably have returned an integer.
A fiber may delete itself, which is the same as exiting the thread.
However, a fiber cannot yield (e.g. SwitchToFiber()
) to itself. That’s
undefined behavior.
#include <stdio.h> #include <windows.h> void coup(void *king) { puts("Long live the king!"); DeleteFiber(king); ConvertFiberToThread(); /* seize the main thread */ /* ... */ } int main(void) { void *king = ConvertThreadToFiber(0); void *pretender = CreateFiber(0, coup, king); SwitchToFiber(pretender); abort(); /* unreachable */ }
Only fibers can yield to fibers, but when the program starts up, there
are no fibers. At least one thread must first convert itself into a
fiber using ConvertThreadToFiber()
, which returns the fiber object
that represents itself. It takes one argument analogous to the last
argument of CreateFiber()
, except that there’s no start routine to
accept it. The process is reversed with ConvertFiberToThread()
.
Fibers don’t belong to any particular thread and can be scheduled on any thread if properly synchronized. Obviously one should never yield to the same fiber in two different threads at the same time.
Contrast with POSIX
The equivalent over on POSIX systems was context switching. It’s also
stackful and symmetric, but it has just three important functions:
getcontext(3)
,
makecontext(3)
, and
swapcontext
.
int getcontext(ucontext_t *ucp); void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...); int swapcontext(ucontext_t *oucp, const ucontext_t *ucp);
These are roughly equivalent to GetCurrentFiber()
, CreateFiber()
, and SwitchToFiber()
. There is no need for ConvertFiberToThread()
since
threads can context switch without preparation. There’s also no DeleteFiber()
because the resources are managed by the program itself.
That’s where POSIX contexts are a little bit more powerful.
The first argument to CreateFiber()
is the desired stack size, with
zero indicating the default stack size. The stack is allocated and freed
by the operating system. The downside is that if you’re frequently
creating and destroying coroutines, those stacks are constantly being
allocated and freed.
In makecontext(3)
, the caller allocates and supplies the stack. Freeing
that stack is equivalent to destroying the context. A program that
frequently creates and destroys contexts can maintain a stack pool or
otherwise more efficiently manage their allocation. This makes it more
powerful, but it also makes it a little more complicated. It would be hard
to remember how to do all this without a careful reading of the
documentation:
/* Create a context */ ucontext_t ctx; ctx.uc_stack.ss_sp = malloc(SIGSTKSZ); ctx.uc_stack.ss_size = SIGSTKSZ; ctx.uc_link = 0; getcontext(&ctx); makecontext(&ctx, proc, 0); /* Destroy a context */ free(ctx.uc_stack.ss_sp);
Note how makecontext(3)
is variadic ( ...
), passing its arguments on
to the start routine of the context. This seems like it might be better
than a user pointer. Unfortunately it’s not, since those arguments are
strictly limited to integers
.
Ultimately I like the fiber API better. The first time I tried it out, I could guess my way through it without looking closely at the API.
Async / await with fibers
Why was I looking at the Fiber API? I’ve known about coroutines for years but I didn’t understand how they could be useful. Sure, the function can yield, but what other coroutine should it yield to? It wasn’t until I was recently bit by the async/await bug that I finally saw a “killer feature” that justified their use. Generators come pretty close, though.
Windows fibers are a coroutine primitive suitable for async/await in C programs, where it can also be useful . To prove that it’s possible, I built async/await on top of fibers in 95 lines of code .
The alternatives are to use a third-party coroutine library or to do it myself with some assembly programming . However, having it built into the operating system is quite convenient! It’s unfortunate that it’s limited to Windows. Ironically, though, everything I wrote for this article, including the async/await demo, was originally written on Linux using Mingw-w64 and tested using Wine . Only after I was done did I try it on Windows.
Before diving into how it works, there’s a general concept about the
Windows API that must be understood: All kernel objects can be in
either a signaled or unsignaled state.
The API provides functions that
block on a kernel object until it is signaled. The two important ones
are WaitForSingleObject()
and WaitForMultipleObjects()
. The latter
behaves very much like poll(2)
on POSIX.
Usually the signal is tied to some useful event, like a process or
thread exiting, the completion of an I/O operation (i.e. asynchronous
overlapped I/O), etc. It’s a generic way to wait for some other event.
However, instead of blocking the thread, wouldn’t it be nice to await
on the kernel object?
In my aio
library for Emacs, the
fundamental “wait” object was a promise. For this API it’s a kernel
object handle.
So, the await function will take a kernel object, register it with the scheduler, then yield to the scheduler. The scheduler — a global variable, so there’s only one scheduler per process — looks like this:
struct { void *main_fiber; HANDLE handles[MAXIMUM_WAIT_OBJECTS]; void *fibers[MAXIMUM_WAIT_OBJECTS]; void *dead_fiber; int count; } async_loop;
While fibers are symmetric, in my async/await implementation they are
not. One fiber is the scheduler, main_fiber
, and the other fibers
always yield to it.
There is an array of kernel object handles, handles
, and an array of fibers
. The elements in these arrays are paired with each other, but
it’s convenient to store them separately, as I’ll show soon. fibers[0]
is waiting on handles[0]
, and so on.
The array is a fixed size, MAXIMUM_WAIT_OBJECTS
(64), because there’s
a hard limit on the number of fibers that can wait at once. I’ll explain
why below. This pathetically small limitation is an unfortunate,
hard-coded restriction of the Windows API. It kills most practical uses
of my little library. Fortunately there’s no limit on the number of
handles we might want to wait on, just the number of fibers.
When a fiber is about to return from its start routine, it yields one
last time and registers itself on the dead_fiber
member. The scheduler
will delete this fiber as soon as it’s given control. Fibers never truly
return since that would terminate the program.
With this, the await function, async_await()
, is pretty simple. It
registers the handle with the scheduler, then yields to the scheduler
fiber.
void async_await(HANDLE h) { async_loop.handles[async_loop.count] = h; async_loop.fibers[async_loop.count] = GetCurrentFiber(); async_loop.count++; SwitchToFiber(async_loop.main_fiber); }
Caveat: The scheduler destroys this handle with CloseHandle()
after it
signals, so don’t try to reuse it. This made my demo simpler, but it
might be better to not do this.
A fiber can exit at any time. Such an exit is inserted implicit before a fiber actually returns:
void async_exit(void) { async_loop.dead_fiber = GetCurrentFiber(); SwitchToFiber(async_loop.main_fiber); }
The start routine given to async_start()
is actually wrapped in the
real start routine. This is how async_exit()
is injected:
struct fiber_wrapper { void (*func)(void *); void *arg; }; static void fiber_wrapper(void *arg) { struct fiber_wrapper *fw = arg; fw->func(fw->arg); async_exit(); } int async_start(void (*func)(void *), void *arg) { if (async_loop.count == MAXIMUM_WAIT_OBJECTS) { return 0; } else { struct fiber_wrapper fw = {func, arg}; SwitchToFiber(CreateFiber(0, fiber_wrapper, &fw)); return 1; } }
The library provides a single awaitable function, async_sleep()
. It
creates a “waitable timer” object, starts the countdown, and returns it.
(Notice how SetWaitableTimer()
is a typically ugly Win32 function.)
HANDLE async_sleep(double seconds) { HANDLE promise = CreateWaitableTimer(0, 0, 0); LARGE_INTEGER t; t.QuadPart = (long long)(seconds * -10000000.0); SetWaitableTimer(promise, &t, 0, 0, 0, 0); return promise; }
A more realistic example would be overlapped I/O. For example, you’d
open a file ( CreateFile()
) in overlapped mode, then when you, say,
read from that file ( ReadFile()
) you create an event object
( CreateEvent()
), populate an overlapped I/O structure with the event,
offset, and length, then finally await on the event object. The fiber
will be resumed when the operation is complete.
Side note: Unfortunately overlapped I/O doesn’t work correctly for files , and many operations can’t be done asynchronously, like opening files. When it comes to files, you’re better off using dedicated threads as libuv does instead of overlapped I/O. You can still await on these operations. You’d just await on the signal from the thread doing synchronous I/O, not from overlapped I/O.
The most complex part is the scheduler:
void async_run(void) { while (async_loop.count) { DWORD nhandles = async_loop.count; HANDLE *handles = async_loop.handles; DWORD r = WaitForMultipleObjects(nhandles, handles, 0, INFINITE); void *fiber = async_loop.fibers[r]; CloseHandle(async_loop.handles[r]); async_loop.handles[r] = async_loop.handles[nhandles - 1]; async_loop.fibers[r] = async_loop.fibers[nhandles - 1]; async_loop.count--; SwitchToFiber(fiber); if (async_loop.dead_fiber) { DeleteFiber(async_loop.dead_fiber); async_loop.dead_fiber = 0; } } }
This is why the handles are in their own array. The array can be passed
directly to WaitForMultipleObjects()
. The return value indicates which
handle was signaled. The handle is closed, the entry removed from the
scheduler, and then the fiber is resumed.
That WaitForMultipleObjects()
is what limits the number of fibers.
It’s not possible to wait on more than 64 handles at once! This is
hard-coded into the API. How? A return value of 64 is an error code, and
changing this would break the API. Remember what I said about being
locked into bad design decisions of the past?
To be fair, WaitForMultipleObjects()
was a doomed API anyway, just
like select(2)
and poll(2)
on POSIX. It’s horribly inefficient since
the entire array of objects being waited on must be traversed on each
call. That’s horribly inefficient when waiting on large numbers of
objects. That’s why interfaces like kqueue (BSD), epoll (Linux), and
IOCP (Windows) exist. Unfortunately IOCP doesn’t really fit this
particular problem well — awaiting on kernel objects — so I couldn’t use
it.
When the awaiting fiber count is zero and the scheduler has control, all fibers must have completed, so there’s nothing left to do. However, the caller can schedule more fibers and then restart the scheduler if desired.
That’s really about all there is to it. Have a look at
demo.c
to see how the API looks in some trivial examples. On Linux you can see
it in action with make check
. On Windows, you justneed to compile
it, then run it like a normal program. If there was a better
function than WaitForMultipleObjects()
in the Windows API, I would
have considered turning this demo into a real library.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK