Wasmjit Spectre Mitigations Part 1

Amongst the initial reactions to Wasmjit were concerns about its vulnerability to Meltdown and Spectre . This isn’t surprising since Spectre primarily affects operating system kernels and language runtimes, Wasmjit being a happy mixture of the two. Wasmjit isn’t vulnerable to Meltdown but it is vulnerable to Spectre Variant 1, Bounds Check Bypass (BCB), and Spectre Variant 2, Branch Target Injection . In this post I’ll cover Wasmjit’s mitigations for Spectre Variant 1. In a following post I’ll cover the mitigations for Spectre Variant 2. If you’re implementating a WebAssembly runtime, or any runtime / operating system kernel in general, or if you just like low-level hacking, you’ll hopefully find this post useful.

Description of the Vulnerability

Since the initial disclosure on 2018-01-03 lots has been published explaining how the vulnerabilities work. Google’s Project Zero has a good rundown . I enjoyed Colin Percival’s post as well. I’ll briefly summarize it here but I recommend those posts if you want to dig a little deeper.

These vulnerabilities primarily concern the effect a CPU’s branch prediction mechanism has on externally observable state. Specifically, vulnerable CPUs fill their cache with memory loaded during mis-speculated branches without flushing it after the mis-speculated branch is discarded . One common type of mis-speculated branch is a bounds check before accessing an array using an untrusted array index:

if (untrusted_input < array_len) {
    val = array[untrusted_input];
    /* do something that affects cache with val */
}

In BCB, an attacker provokes the target into attempting a load from an invalid array index and then, if mis-speculation occurs, can observe the effect on the cache to infer the value of data that would otherwise be inaccessible.

Important to note that Spectre Variant 1 is typically only concerned with mis-speculated loads. There is also Variant 1.1 which is concerend with mis-speculated stores. At the present moment Wasmjit doesn’t implement mitigations for that.

BCB Mitigation Techniques

Intel recommends two techniques for mitigating BCB: lfence and bounds-clipping. Applying the lfence technique transforms the above example into the following:

if (untrusted_input < array_len) {
    lfence();
    val = array[untrusted_input];
    /* ... */
}

lfence is essentially a serializing operation. It doesn’t execute until all previous instructions have completed. Serializing execution between every bounds check and subsequent load is a fool-proof way to block malicious behavior during mis-speculations but isn’t ideal because it can have a dramatic negative effect on performance.

Here’s how bounds-clipping works:

if (untrusted_input <= POWER_OF_TWO_MINUS_ONE) {
    val = array[untrusted_input & POWER_OF_TWO_MINUS_ONE];
    /* ... */
}

Bounds-clipping is a lot more efficient than lfence but it isn’t ideal because it only works with arrays that have lengths that are powers of two. Additionally Intel won’t guarantee its effectiveness with future processor generations.

The approach taken by Wasmjit is similar to the approach taken in the Linux kernel and described by Chandler Carruth :

if (untrusted_input < len) {
    untrusted_input = array_index_nospec(untrusted_input, len)
    val = array[untrusted_input];
}

The array_index_nospec() function is responsible for “hardening” the array index. Here’s a simplified version of how that works:

size_t array_index_nospec(size_t idx, size_t len)
{
    __asm__ ("" : "=r" (idx) : "0" (idx));
    size_t mask = idx < len ? ~(size_t) 0 : (size_t) 0;
    return idx & mask;
}

The idea here is that when the processors mis-speculates, either the mask will be 0 or the computation of mask will stall until the mis-speculation can be rectified. The point of the __asm__ statement is to force the compiler to compute the mask without optimizations based on knowledge of what idx may be.

This method requires that the computation of mask be done without any branches (hopefully for obvious reasons) and that the CPU isn’t able to (mis-)predict the value of mask (instead of stalling). On GCC and Clang on x86_64 these preconditions are satisfied. Similar to bounds-clipping, Intel doesn’t guarantee this method will work with future processor generations. The good news is that GCC has implemented the functionality of array_index_nospec() natively as a compiler builtin , so going forward the compiler will be responsible for the implementation details.

Wasmjit Mitigations

At a high level, Wasmjit performs conditional loads using untrusted indices in two main places 1) in the runtime host functions made available to user programs and 2) in the code generated by the JIT.

Host Runtime Function Mitigations

User code directly interacts with Wasmjit through the host functions it exports. These host functions mimic the de-facto interface implemented by Emscripten, which, in turn, roughly mimics the Linux kernel system call interface. This interface is the only way user programs interact with the outer world. From the perspective of the user program, it uses normal C pointers to pass references to data to the host interface. From the perspective of Wasmjit, these pointers are actually indices into the singleton memory instance of that WebAssembly module.

To safely load data from user-provided pointers, Wasmjit first checks that the pointer is a valid index in the singleton memory instance. After that, a custom memcpy() routine is run that properly hardens the array index before performing the load. Here’s an example:

void custom_memcpy(void *restrict memory_base, size_t memory_size,
                   void *restrict dst, uint32_t wasm_ptr, uint32_t size)
{
    size_t i;
    for (i = 0; i < size & ~(size_t)0x7; i += 8) {
        uint32_t hardened = array_index_nospec(wasm_ptr + i, 8, memory_size);
        memcpy((char *) dst + i, (char *) memory_base + hardened, 8);
    }
    for (i = size & ~(size_t)0x7; i < size; ++i) {
        uint32_t hardened = array_index_nospec(wasm_ptr + i, 1, memory_size);
        *((char *) dst + i) = *((char *) memory_base + hardened);
    }
}

We copy in blocks of 8 to minimize the performance impact of hardening the index on every load. To load in blocks of 8 safely, an extra argument needs to be provided to array_index_nospec() :

size_t array_index_nospec(size_t idx, size_t extent, size_t len)
{
    __asm__ ("" : "=r" (idx) : "0" (idx));
    size_t mask = (idx + extent) <= len ? ~(size_t) 0 : (size_t) 0;
    return idx & mask;
}

Without the extent argument, array_index_nospec() only hardens based on whether a single access is safe. That’s no longer the case since we’re copying in blocks of 8 . If accessing a single element of the array, just invoke array_index_nospec() with an extent argument of 1 .

Altogether, a typical host function roughly looks like this:

uint32_t foo(uint32_t ptr, uint32_t size, struct wasmjit_ctx *ctx)
{
    uint32_t user_int;
    
    /* check if untrusted memory reference is valid */
    if (ptr + size > ctx->memory_size)
        return 0;

    if (size != sizeof(user_int))
        return 0;

    custom_memcpy(ctx->memory_base, ctx->memory_size,
                  &user_int, ptr, sizeof(user_int));

    /* do stuff with user_int... */
    
    return 1;
}

JIT Mitigations

There are 2 WebAssembly instructions that directly involve array indexing using a user-provided index: br_table , call_indirect . In addition, every load instruction is vulnerable: i32.load , i64.load , f32.load , f64.load , i32.load8_s , i32.load8_u , i32.load16_s , i32.load16_u , i64.load8_s , i64.load8_u , i64.load16_s , i64.load16_u , i64.load32_s , i64.load32_u .

Since the JIT generates machine code that executes those instructions, Wasmjit can’t simply use the array_index_nospec() function to harden the array indexes. Instead, we need a machine code sequence that does the equivalent hardening. On x86_64 we can use the sbb instruction after cmp in the following way:

# %rax contains the untrusted index
# %rdx contains the array size
cmp %rdx, %rax

# jump if CF == 0, mis-speculations may not jump if CF == 0
jae BAD_INDEX

# sbb computes %rcx = (%rcx - %rcx - CF)
sbb %rcx, %rcx

# if CF == 0, then %rcx == 0
and %rcx, %rax

After the preceding instruction sequence, it should be safe to use %rax as an array index.

Final Words

Is the technique outlined above a generally good solution to the BCB vulnerability? Frankly, it’s not. Requiring programmers to manually annotate each conditional array access doesn’t scale and is error-prone. Sadly, the lack of a robust solution from Intel, AMD, ARM and others, coupled with the existence of “good enough” software mitigations, leaves me doubtful the situation will change. Unlike the other Spectre/Meltdown vulnerabilities, Intel’s latest 9th generation processors still don’t address BCB . Fortunately, there has been a hardware response from Intel on Spectre Variant 2, which we’ll cover next time.

Description of the Vulnerability

BCB Mitigation Techniques

Wasmjit Mitigations

Host Runtime Function Mitigations

JIT Mitigations

Final Words

Recommend

A Comparison of Swift and Kotlin Languages [FREE]

Clickhouse as a replacement for ELK, Big Query and TimescaleDB

Airflow Architecture at Drivy

Spring Boot 工程集成全局唯一ID生成器 Vesta

Netflix Shows The Future of Android Architecture

机器学习之类别不平衡问题：不平衡问题的常用指标

Spring Boot Actuator详解与深入应用（二）：Actuator 2.x | Aoho's Blog

是时候学习真正的 spark 技术了

Same-origin policy(SOP)和Content Security Policy(CSP)拾遗 | tomwang1013的博客 |...

HBase篇(5)- BloomFilter

About Joyk