Wasmjit Spectre Mitigations Part 1
source link: https://www.tuicool.com/articles/hit/vMjM3yn
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Amongst the initial reactions to Wasmjit were concerns about its vulnerability to Meltdown and Spectre . This isn’t surprising since Spectre primarily affects operating system kernels and language runtimes, Wasmjit being a happy mixture of the two. Wasmjit isn’t vulnerable to Meltdown but it is vulnerable to Spectre Variant 1, Bounds Check Bypass (BCB), and Spectre Variant 2, Branch Target Injection . In this post I’ll cover Wasmjit’s mitigations for Spectre Variant 1. In a following post I’ll cover the mitigations for Spectre Variant 2. If you’re implementating a WebAssembly runtime, or any runtime / operating system kernel in general, or if you just like low-level hacking, you’ll hopefully find this post useful.
Description of the Vulnerability
Since the initial disclosure on 2018-01-03 lots has been published explaining how the vulnerabilities work. Google’s Project Zero has a good rundown . I enjoyed Colin Percival’s post as well. I’ll briefly summarize it here but I recommend those posts if you want to dig a little deeper.
These vulnerabilities primarily concern the effect a CPU’s branch prediction mechanism has on externally observable state. Specifically, vulnerable CPUs fill their cache with memory loaded during mis-speculated branches without flushing it after the mis-speculated branch is discarded . One common type of mis-speculated branch is a bounds check before accessing an array using an untrusted array index:
if (untrusted_input < array_len) { val = array[untrusted_input]; /* do something that affects cache with val */ }
In BCB, an attacker provokes the target into attempting a load from an invalid array index and then, if mis-speculation occurs, can observe the effect on the cache to infer the value of data that would otherwise be inaccessible.
Important to note that Spectre Variant 1 is typically only concerned with mis-speculated loads. There is also Variant 1.1 which is concerend with mis-speculated stores. At the present moment Wasmjit doesn’t implement mitigations for that.
BCB Mitigation Techniques
Intel recommends
two techniques for mitigating BCB: lfence
and bounds-clipping.
Applying the lfence
technique transforms the above example into
the following:
if (untrusted_input < array_len) { lfence(); val = array[untrusted_input]; /* ... */ }
lfence
is
essentially a serializing operation. It doesn’t execute until all
previous instructions have completed. Serializing execution between
every bounds check and subsequent load is a fool-proof way to block
malicious behavior during mis-speculations but isn’t ideal because it
can have a dramatic negative effect on performance.
Here’s how bounds-clipping works:
if (untrusted_input <= POWER_OF_TWO_MINUS_ONE) { val = array[untrusted_input & POWER_OF_TWO_MINUS_ONE]; /* ... */ }
Bounds-clipping is a lot more efficient than lfence
but it isn’t
ideal because it only works with arrays that have lengths that are
powers of two. Additionally Intel won’t guarantee its effectiveness
with future processor generations.
The approach taken by Wasmjit is similar to the approach taken in the Linux kernel and described by Chandler Carruth :
if (untrusted_input < len) { untrusted_input = array_index_nospec(untrusted_input, len) val = array[untrusted_input]; }
The array_index_nospec()
function is responsible for “hardening” the
array index. Here’s a simplified version of how that works:
size_t array_index_nospec(size_t idx, size_t len) { __asm__ ("" : "=r" (idx) : "0" (idx)); size_t mask = idx < len ? ~(size_t) 0 : (size_t) 0; return idx & mask; }
The idea here is that when the processors mis-speculates, either the mask
will be 0
or the computation of mask
will stall until the
mis-speculation can be rectified. The point of the __asm__
statement
is to force the compiler to compute the mask without optimizations based
on knowledge of what idx
may be.
This method requires that the computation of mask
be done without
any branches (hopefully for obvious reasons) and that the CPU isn’t
able to (mis-)predict the value of mask
(instead of stalling). On
GCC and Clang on x86_64 these preconditions are satisfied. Similar to
bounds-clipping, Intel doesn’t guarantee this method will work with
future processor generations. The good news is that GCC has
implemented the functionality of array_index_nospec()
natively as a compiler
builtin
,
so going forward the compiler will be responsible for the implementation details.
Wasmjit Mitigations
At a high level, Wasmjit performs conditional loads using untrusted indices in two main places 1) in the runtime host functions made available to user programs and 2) in the code generated by the JIT.
Host Runtime Function Mitigations
User code directly interacts with Wasmjit through the host functions it exports. These host functions mimic the de-facto interface implemented by Emscripten, which, in turn, roughly mimics the Linux kernel system call interface. This interface is the only way user programs interact with the outer world. From the perspective of the user program, it uses normal C pointers to pass references to data to the host interface. From the perspective of Wasmjit, these pointers are actually indices into the singleton memory instance of that WebAssembly module.
To safely load data from user-provided pointers, Wasmjit first checks
that the pointer is a valid index in the singleton memory instance. After
that, a custom memcpy()
routine is run that properly hardens the array index
before performing the load. Here’s an example:
void custom_memcpy(void *restrict memory_base, size_t memory_size, void *restrict dst, uint32_t wasm_ptr, uint32_t size) { size_t i; for (i = 0; i < size & ~(size_t)0x7; i += 8) { uint32_t hardened = array_index_nospec(wasm_ptr + i, 8, memory_size); memcpy((char *) dst + i, (char *) memory_base + hardened, 8); } for (i = size & ~(size_t)0x7; i < size; ++i) { uint32_t hardened = array_index_nospec(wasm_ptr + i, 1, memory_size); *((char *) dst + i) = *((char *) memory_base + hardened); } }
We copy in blocks of 8
to minimize the performance impact of
hardening the index on every load. To load in blocks of 8
safely, an
extra argument needs to be provided to array_index_nospec()
:
size_t array_index_nospec(size_t idx, size_t extent, size_t len) { __asm__ ("" : "=r" (idx) : "0" (idx)); size_t mask = (idx + extent) <= len ? ~(size_t) 0 : (size_t) 0; return idx & mask; }
Without the extent
argument, array_index_nospec()
only
hardens based on whether a single access is safe. That’s no longer
the case since we’re copying in blocks of 8
. If accessing a single element of
the array, just invoke array_index_nospec()
with an extent argument
of 1
.
Altogether, a typical host function roughly looks like this:
uint32_t foo(uint32_t ptr, uint32_t size, struct wasmjit_ctx *ctx) { uint32_t user_int; /* check if untrusted memory reference is valid */ if (ptr + size > ctx->memory_size) return 0; if (size != sizeof(user_int)) return 0; custom_memcpy(ctx->memory_base, ctx->memory_size, &user_int, ptr, sizeof(user_int)); /* do stuff with user_int... */ return 1; }
JIT Mitigations
There are 2 WebAssembly instructions that directly involve array
indexing using a user-provided index: br_table
, call_indirect
. In
addition, every load instruction is vulnerable: i32.load
, i64.load
, f32.load
, f64.load
, i32.load8_s
, i32.load8_u
, i32.load16_s
, i32.load16_u
, i64.load8_s
, i64.load8_u
, i64.load16_s
, i64.load16_u
, i64.load32_s
, i64.load32_u
.
Since the JIT generates machine code that executes those instructions,
Wasmjit can’t simply use the array_index_nospec()
function to harden
the array indexes. Instead, we need a machine code sequence that does
the equivalent hardening. On x86_64 we can use the
sbb
instruction after
cmp
in the
following way:
# %rax contains the untrusted index # %rdx contains the array size cmp %rdx, %rax # jump if CF == 0, mis-speculations may not jump if CF == 0 jae BAD_INDEX # sbb computes %rcx = (%rcx - %rcx - CF) sbb %rcx, %rcx # if CF == 0, then %rcx == 0 and %rcx, %rax
After the preceding instruction sequence, it should be safe to use %rax
as an array index.
Final Words
Is the technique outlined above a generally good solution to the BCB vulnerability? Frankly, it’s not. Requiring programmers to manually annotate each conditional array access doesn’t scale and is error-prone. Sadly, the lack of a robust solution from Intel, AMD, ARM and others, coupled with the existence of “good enough” software mitigations, leaves me doubtful the situation will change. Unlike the other Spectre/Meltdown vulnerabilities, Intel’s latest 9th generation processors still don’t address BCB . Fortunately, there has been a hardware response from Intel on Spectre Variant 2, which we’ll cover next time.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK