

Integrating fault injection in development workflows
source link: https://blog.ledger.com/fault-injection-simulation/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

10 August 2022 / DONJON
Integrating fault injection in development workflows
Fault injection vulnerabilities can be tedious to evaluate even with the right tools and experts. Could we improve this situation by giving appropriate open-source tools to developers?
As a first step toward mitigating fault injection attacks, we introduce a new open-source evaluation tool that painlessly integrates with an IDE or continuous integration testing pipelines. We illustrate the usage of this tool using the Rust programming language.
Table of contents
Fault injection simulation with Rainbow
Fault injection effects are usually simulated as corruptions during register write (bit stuck-at model) or as instructions skips1. Security-critical embedded devices such as smart cards and hardware wallets need to be hardened using these models to guarantee that these effects do not introduce vulnerabilities in their processing.

Electromagnetic fault injection setup using a Scaffold board and a SiliconToaster.
Ledger Donjon has been developing the open-source Python side-channel and fault injection simulator called Rainbow since 2019. We recently added first-class support for simulating fault injection attacks by providing fault models:
fault_skip
models fault attacks causing one instruction to be skipped during execution,fault_stuck_at
models fault attacks causing the destination register of one instruction to be overridden with a faulty value during execution. A “stuck-at zeros” model is often referred to as a bit-reset attack, and a “stuck-at ones” model is often referred to as a bit-set attack.
Using these models with Rainbow makes it possible to quickly simulate the effects of different fault injection attacks without needing access to expensive equipment, as seen above, but also removing uncertainties and measure effects like jitter from the equation. It also opens up the ability to check exhaustively that a given fault model cannot be applied to a given code.
Let’s illustrate the usage of the fault_stuck_at
model on the third instruction of a PIN verification process taken from an older Trezor firmware:
>>> from rainbow.devices.stm32 import rainbow_stm32f215
>>> from rainbow.fault_models import fault_stuck_at
>>>
>>> # Instanciate emulated device and load the firmware
>>> emulator = rainbow_stm32f215()
>>> emulator.load("examples/HW_analysis/trezor.elf")
134454507
>>> emulator.trace_regs = True # also output register values
>>>
>>> # Setup reference PIN and attempt
>>> emulator[0x08008110 + 0x189] = b"1874\x00"
>>> emulator[0xCAFECAFE] = b"0000\x00"
>>>
>>> # Fault 3rd instruction inside "storage_containsPin" function with the
>>> # "stuck_at_0xFFFFFFFF" fault model. This model overrides the current
>>> # instruction destination register with 0xFFFFFFFF.
>>> emulator["r0"] = 0xCAFECAFE # memory address of PIN attempt
>>> emulator["lr"] = 0xAAAAAAAA # function return address
>>> begin = emulator.functions["storage_containsPin"]
>>> emulator.start_and_fault(fault_stuck_at(0xFFFFFFFF), 2, begin, 0xAAAAAAAA)
8012458 push {r0, r1, r2, r4, r5, r6, r7, lr}; sp = 3fffffdf
801245A ldr r2, [pc, #0x38] ;# r2 = 2001fff8
801245C mov r5, r0 ;# /!\ fault_stuck_at_0xFFFFFFFF /!\ r5 = ffffffff
801245E ldr r3, [r2] ;# r3 = 00000000
8012460 ldr r7, [pc, #0x34] ;# r7 = 08008110
8012462 str r3, [sp, #4] ;#
8012464 movs r3, #0 ;# r3 = 00000000
8012466 subs r4, r5, r0 ;# r4 = 35013501
8012468 ldrb r6, [r5], #1 ;# r6 = 00000000 r5 = 00000000
801246C add r4, r7 ;# r4 = 3d01b611
801246E ldrb.w r1, [r4, #0x189] ;# r1 = 00000000
8012472 cbnz r6, #0x8012488 ;#
8012474 orrs r3, r1 ;# r3 = 00000000
8012476 ldr r1, [sp, #4] ;# r1 = 00000000
8012478 ldr r3, [r2] ;# r3 = 00000000
801247A ite eq ;# itstate = 00000000
801247C movs r0, #1 ;# r0 = 00000001
8012480 cmp r1, r3 ;# cpsr = 600001f3
8012482 beq #0x8012490 ;# pc = 08012490
8012490 add sp, #0xc ;# sp = 3fffffeb
8012492 pop {r4, r5, r6, r7, pc};#134292572
>>> emulator["r0"] # r0 is the function return value register
1
>>> hex(emulator["pc"])
'0xaaaaaaaa'
We successfully faulted the output of the PIN code comparison function: rather than returning 0
as expected (1874 != 0000
), it returned 1
.
We can find all instructions vulnerable to a single-fault attack with these fault models if we iterate this fault simulation on every instruction. However, this method cannot find vulnerabilities caused by fault injection effects that are not modelled. Another shortcoming is that we do not expect that firmware developers will write Python code for each piece of critical code they need to harden anytime soon.
Integration in Rust development workflows
Developers are accustomed to using code style checking and testing pipelines in their daily workflows. Taking inspiration from how these tools are used, we propose a new tool called fi_check
that checks for potential fault injection vulnerabilities. This tool was designed to be easily embeddable in an IDE or continuous testing pipelines, enabling developers to be alerted by code modifications that introduce single-fault injection vulnerabilities.

We only considered single-fault injection attacks to simplify the problem. A naive generalization to N-fault injection attacks would exponentially increase the evaluation time. Protecting code against single-fault injections is still an important goal as it makes potential attacks much harder.
Writing fault injection evaluation tests in Rust
Let’s consider that compare_pin
is a security-critical function that needs to be hardened against single-fault injection attacks. To define the expected behavior of this function and prepare it for automatic fault injection evaluation, one may append to their Rust source code:
#[cfg(test)]
mod tests_fi {
// rust_fi is a Rust crate containing fi_check hooks
use rust_fi::{assert_eq, rust_fi_faulted_behavior, rust_fi_nominal_behavior};
const CORRECT_PIN: [u8; 4] = [1, 2, 3, 4];
// A first fault injections test against a PIN code verification
#[no_mangle]
#[inline(never)]
fn test_fi_compare_pin() {
assert_eq!(compare_pin(&[0, 0, 0, 0], &CORRECT_PIN), false);
}
// Same fault injections test but with only one different digit
#[no_mangle]
#[inline(never)]
fn test_fi_compare_pin_variant() {
assert_eq!(compare_pin(&[2, 2, 3, 4], &CORRECT_PIN), false);
}
}
This structure looks like classical Rust tests asserting that the compare_pin
function returns false
as the PIN codes do not match. fi_check
can recognize these tests and evaluates whether it can make compare_pin
return true
by faulting its instructions. We do not use #[test]
macro as we just need the function symbol to exist in the compiled binary to execute it later with Rainbow.
Thanks to this tool, Rust crates can be quickly evaluated for potential vulnerability to single-fault injection by:
- Adding the
rust_fi
crate to their projectdev-dependencies
, - Writing fault injections robustness tests using the above structure,
- Running
fi_check.py
on the crate.
By default, fi_check.py
instantiates a Rainbow emulator configured for ARM targets, but this can be easily changed to target other architectures.
How does it work?
Successful fault injection detection:
We consider a function taking no arguments and returning one Boolean value (true
or false
).
This function logic is written to always returns false
by checking an invalid condition2.
In theory, this function should always return false
. However, if we execute this function on real hardware, it can result in 3 different states:
- Nominal behavior: the code returned
false
as expected, - Faulted behavior: the code returned
true
, meaning the disrupted execution caused the check to be skipped, - Panicked or crashed: an exception was raised during the execution, such as an out of bounds, or the device got an unexpected instruction and crashed.
Healthy hardware not under extreme conditions should always behave in the nominal behavior. In our case, we want to detect if an attack creating a single fault in the processing would be able to get a faulted behavior without raising an exception or crashing the device.
assert_eq!
is a macro that raises a panic if operands differ. We can distinguish between these 3 states by using a modified assert_eq!
macro in Rust.
Proposed evaluation algorithm:
We choose one of the proposed fault models, then:
- We execute the function multiple times, but in each run, we apply the chosen fault model on the i-th instruction. i starts from the first instruction and increments until we reach the end of the function.
- When the function returns
true
without panicking or crashing, we know which instruction makes the function vulnerable to this fault model.
If the developer is not directly working on assembly code, we use addr2line
tool3, which can retrieve which line of code generated the problematic assembly instruction. This requires to compiling the code with debug symbols4.
Examples of code evaluation and mitigation
We will illustrate the usage of this tool with some pieces of code that are vulnerable to single-fault injection attacks once compiled to ARM Cortex-M3 assembly (ARM Thumb). A commonly used function for this kind of benchmark is the critical PIN code comparison.
Example 1: imperative-style PIN code comparison
Let’s consider the following PIN code comparison function written in an imperative-style Rust code:
/// Return true if pins are equal, false otherwise
pub fn compare_pin(user_pin: &[u8], ref_pin: &[u8]) -> bool {
let mut good = true;
for i in 0..ref_pin.len() {
if user_pin[i] != ref_pin[i] {
good = false; // src/lib.rs:22
}
}
good
}
The compiler outputs the following assembly code:
As expected by the calling convention used by Rust, user_pin
array is represented by a pointer in r0
and a size in r1
, ref_pin
array is represented by a pointer in r2
and a size in r3
and the returned value is represented by r0
.
We run ./fi_check.py --cli test_fi_simple
to check for any interesting faults:
The output indicates vulnerable instructions in the test_fi_simple
function, which is the test function calling compare_pin
, so we can ignore these. It also indicates that this function is vulnerable to a bit-set fault attack. When looking at the source code, we understand that this is due to the developer initializing the returned value to true
, then setting it false
during comparison. This vulnerability exploitation consists in setting good=0xFFFFFFFF
in the last iteration of the loop, which Rust considers to be equivalent to true
.
On a side note, we also observe that the Rust compiler makes the code panic if user_pin
array is accessed out of bounds (checked at 0x90
) as expected from a memory-safe language.
Hardening through double call and inlining:
This compare_pin
function is vulnerable to a simple fault attack. A common mitigation is to simply execute the test twice.
#[inline(always)]
pub fn compare_pin_double_inline(user_pin: &[u8], ref_pin: &[u8]) -> bool {
if compare_pin(user_pin, ref_pin) {
// If the second compare_pin call returns false, then we know that a
// fault happened and we should handle it. We choose to simplify this
// example by returning false.
compare_pin(ref_pin, user_pin)
} else {
false
}
}
Running an evaluation with fi_check
on this function confirms that we successfully hardened it:
Hardening using a protected Boolean type:
Boolean values are usually encoded on the first bit of a register, meaning that “stuck-at” fault injection attacks can flip its value. A method to harden these values against fault injection vulnerabilities is to change the representation of “true” and “false”. We choose the following representation on 32-bit:
// TRUE is not the opposite of FALSE to force the compiler not to use NEG
// First and last bits are 0, in case the compiler determines this bit can be
// cast into a Boolean
const TRUE: u32 = 0b0010_1010_1010_1010_1010_1110_1010_1010;
const FALSE: u32 = 0b0110_0101_0101_0110_1100_0011_0101_1100;
This enables us to use the 31 extra bits to do error checking. We implemented these checks as a Bool
Rust type.
We can then use it in our PIN verification function:
pub fn compare_pin_protected(user_pin: &[u8], ref_pin: &[u8]) -> Bool {
let mut good = Bool::from(true);
for i in 0..ref_pin.len() {
if user_pin[i] != ref_pin[i] {
good = Bool::from(false);
}
}
good
}
fi_check
confirms that this method works:
Example 2: functional-style PIN code comparison
Sometimes it can be difficult to predict how a function will be assembled. For illustration purposes, let’s switch to functional-style code:
pub fn compare_pin_fp(user_pin: &[u8], ref_pin: &[u8]) -> bool {
user_pin
.iter()
.zip(ref_pin.iter())
.fold(0, |acc, (a, b)| acc | (a ^ b))
== 0
}
The compiler outputs the following assembly code:
Our tool can find 9 vulnerable points, 4 vulnerabilities with the fault_skip
model, 3 with the stuck_at_0x0
model and 2 with the stuck_at_0xFFFFFFFF
model:
Hardening using a protected Boolean type:
Let’s use the protected Boolean type that we described earlier:
use fault_detection::bool::Bool;
pub fn compare_pin_fp_protected(user_pin: &[u8], ref_pin: &[u8]) -> Bool {
!user_pin
.iter()
.zip(ref_pin.iter())
.fold(Bool::from(false), |acc, (a, b)| {
acc | Bool::from(a != b)
})
}
Using the protected Boolean type, we are now down to 2 vulnerable instructions. These last two vulnerabilities are due to an early size check on the input that makes the function return true if one array is empty. In our context, we should handle these cases manually.
use fault_detection::bool::Bool;
pub fn compare_pin_fp_protected(user_pin: &[u8], ref_pin: &[u8]) -> Bool {
if ref_pin.is_empty() || user_pin.len() != ref_pin.len(){
return Bool::from(false);
}
!user_pin
.iter()
.zip(ref_pin.iter())
.fold(Bool::from(false), |acc, (a, b)| {
acc | Bool::from(a != b)
})
}
Now our tool no longer finds any vulnerable instructions, voilà!
Conclusion
We show that we are able to simulate the effect of modelled fault injection attacks using Rainbow. Then we tightly integrate this simulator with the Rust ecosystem to demonstrate a scenario where these evaluations are relatively easy to set up for developers.
To demonstrate the integration of such tools in workflows, we opened a pull request that introduces a vulnerability: https://github.com/Ledger-Donjon/fault_injection_checks_demo/pull/13. The automated checks fail due to fi_check
finding a vulnerability.
Such a tool enables developers to design new Rust types hardened against fault
injection attacks.
We propose an early design of a protected Boolean type
and a Protected
struct that hardens PartialEq traits.
-
M. Otto, “Fault attacks and countermeasures.” Ph.D. dissertation, University of Paderborn, 2005 ↩
-
We consider that the compiler does not optimize the condition. This can always be enforced with a few tricks if needed, for example with https://doc.rust-lang.org/std/hint/fn.black_box.html ↩
-
From GNU Binutils, available in most GNU/Linux distributions. A cross-platform version can also be installed from https://github.com/gimli-rs/addr2line. ↩
-
We use the release profile in Rust with
debug=true
. This does not increase the final binary size on flash for embedded binaries. ↩
Recommend
-
27
Product teams from AirBnb and New York Times to Shopify and Artsy (among many others) are converging on a new set of best practices and technologies for building the web apps that their businesses depend on. This trend re...
-
9
✏️ Edit Show NotesIt's a workflow extravaganza! Scott and Wes talk about their development workflows, covering everything from design to deployment.
-
6
WEBINAR ON PRODUCTS AND SERVICES POLITEKNIK TUANKU SULTANAH BAHIYAH (RASMI) 2.7K vi...
-
8
Abstract AWS Fault Injection Simulator now supports Spot Interruptions, now you can trigger the interrup...
-
7
Cool Tools: Fault Injection into Unit Tests with JBoss Byteman - Easier Testing of Error Handling February 25, 2012 How do yo...
-
15
Using Fault Injection Testing to Improve DoorDash Reliability  April 25, 2022 13 Minute Read Backend
-
5
Try, Buy, Sell Red Hat Hybrid CloudAccess technical how-tos, tutorials, and learning paths focused on Red Hat’s hybrid cloud managed services.
-
12
Background It might sound paradoxical to deliberately break something we’re trying to fix, but sometimes, that’s the most efficient method to do it. Fault injection is the process by which we deliberately introduce faults into the system....
-
10
Reshaping productive workflows— integrating UX and AI
-
9
Web Development Is Getting Too Complex, And It May Be Our FaultAn overwhelming number of frameworks and tooling available today gives the impression that web development has gotten...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK