4

How to get started with property-based testing in JavaScript using fast-check

 2 years ago
source link: https://jrsinclair.com/articles/2021/how-to-get-started-with-property-based-testing-in-javascript-with-fast-check/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to get started with property-based testing in JavaScript using fast-check

Written by James Sinclair on the 15th November 2021


Property-based testing helps us write better tests, with less code, and greater coverage. This leads to more confidence in our code, and fewer bugs in our applications. But, as always, there’s a price. Property tests take more effort to write, and they take longer to run. Still, I’m convinced that the trade-off is worth it. In this article, we’ll run through an example of how to write property tests using fast-check.

What is property-based testing?

Most tests we programmers write are example-based tests. That is, we give the computer some sample input, and run the function under test. Then we check the output is as we expect. (More or less). Property-based testing (also known as generative testing) is a different approach. Instead of writing every example input by hand, we instruct the computer to generate them for us. We tell the computer what types of input we want, and it generates hundreds of random examples.

Now, this raises a question: If we have randomly generated input, how do we know what output to expect? And the answer is, we don’t. Well, not exactly, anyway. Instead of testing that a particular input matches expected output, we assert properties.

A property is something that should always be true. They’re sometimes referred to as ‘laws’ or ‘rules’. No matter what random data we throw at our function, this property should hold.

This sounds abstract, and a little mathematical. So let’s look at an example.

A hypothetical scenario

Before we jump into property tests, let’s set the scene. Imagine we’re writing a To Do application.1 And we’d like to add a feature where we move completed tasks to an archive once they’re older than one minute.

An example-based test

If we were to write a test for this with Jest, we’d often start with some setup like the following:

// Some date constants to make life easier. We're using timestamps
// rather than date objects to keep the maths simple.
const START = 1636521855000;
const ONE_MINUTE = 60000;
const ONE_HOUR = 60 * ONE_MINUTE;

// We create some example data. All tasks have, at minimum,
// both a created date and a title. The completed time is optional.
// A task that has a missing or undefined completed field is not
// yet done.
const newTask = {
    created: START - ONE_MINUTE,
    title: 'A mighty task of spectacular derring-do',
    completed: START,
};

// We intend to pass START as our reference time. So we make an
// old task that is was completed 59 minutes ago.
const oldCompletedTask = {
    created: START - ONE_HOUR,
    completed: START - ONE_HOUR + ONE_MINUTE,
    title: 'should be archived',
};

// This is our basic input. We have an array of 'active' tasks, and
// an array of 'archive' tasks. The active list has one task we
// expect to stay in the active list, and one we expect to move.
const basicInput = {
    active: [newTask, oldCompletedTask],
    archive: [],
};

// After we run our archive function we expect the following
// output:
const expectedBasic = {
    active: [newTask],
    archive: [oldCompletedTask],
};

With all that in place, we’re finally ready to write our example test. Assuming we’ve imported our moveOldTasksToArchive() function from somewhere, we’d write something like this:

describe('moveOldTasksToArchive()', () => {
    it('should move the old item to the archive', () => {
        expect(moveOldTasksToArchive(basicInput, START))
            .toEqual(expectedBasic);
    });
});

With that test in place, let’s write some code that will make it pass. So we might write something like the following:

const moveOldTasksToArchive = ({active, archive}, currentTime) => ({
    active: active.filter(({completed}) => currentTime - completed < ONE_MINUTE),
    archive: active.filter(({completed}) => currentTime - completed >= ONE_MINUTE).concat(archive),
});

And with that code in place, our test passes. But we’re not silly enough to think that one test is enough to give us confidence we got this right. So, we add a few more examples. We start with some more sample data:

// We should test the edge case for when the arrays are empty.
const emptyInput = {active: [], archive: []};

// And we'd also like to test the case where there's something
// already in the archive. So we'll create another old task…
const oldAbandonedTask = {
    created: START - ONE_HOUR,
    title: 'Abandoned, not completed',
};

// …and put the old task into the archive to create a new input.
const populatedArchive = {
    active: [oldCompletedTask],
    archive: [oldAbandonedTask],
};

// This is the expected output for the case where the archive
// already has something in it.
const expectedPopulated = {
    active: [],
    archive: [oldCompletedTask, oldAbandonedTask],
};

Jest has a neat feature that lets us put those examples into a table. It might look something like this:

describe.each`
    description            | input               | date     | expected
-----------------------------------------------------------------------------
    ${'Basic example'}     | ${basicInput}       | ${START} | ${expectedBasic}
    ${'Empty arrays'}      | ${emptyInput}       | ${START} | ${emptyInput}
    ${'Populated archive'} | ${populatedArchive} | ${START} | ${expectedPopulated}
`('$description', ({input, date, expected}) => {
    test(`Given a sample state and date,
          when we run moveOldTasksToArchive(),
          it should return the expected output`, () => {
        expect(moveOldTasksToArchive(input, date))
            .toEqual(expected);
    });
});

If this were ‘real’ code, we’d add more examples. But these aren’t bad. They give us a reasonable amount of coverage with just three examples.

It does get annoying writing out all those examples by hand though. And it’s especially tedious when we have structured data like arrays and objects. A good property-testing framework can take the tedium out of writing example data.

Generating test data

With property tests, we get the computer to generate examples for us. Fast-check calls these example-generators ‘arbitraries’. As in, ‘generate an arbitrary number’ or ‘generate an arbitrary string’. And fast-check comes with a whole swag of arbitraries for generating basic data. For example:

import * as fc from 'fast-check';

const myStringArbitrary = fc.string();
const myNumberArbitrary = fc.number();
const myDateArbitrary   = fc.date();

Note, these aren’t actual strings, numbers or dates. We’ve created data structures that will generate strings, numbers or dates for us.

These simple data types will only get us so far. For our case, we want structured data. For these, fast-check gives us ‘combinators’. These let us combine simple arbitraries into more complex ones. Using these, we can make a generator for a task. Let’s break it down step-by-step.

First, we want a created-time for our task. So we create a date arbitrary:

// This function will eventually create a todo item.
// For now, we start with just a date arbitrary.
const genTodo = () => {
   const createdDateArb = fc.date();
}

Next, we want to generate a string for our task title:

const genTodo = () => {
   const createdDateArb = fc.date();
   const titleArb = fc.string();
}

And we also want a date for the completed time. That’s another arbitrary too:

const genTodo = () => {
   const createdDateArb = fc.date();
   const titleArb = fc.string();
   const completedDateArb = fc.date();
}

Now that we have abitraries to generate all three components of a task, we want to combine them into an object. There’s a combinator for that: fc.record(). It lets us specify an object structure, and how to generate values for each key:

const genTodo = () => {
   const createdDateArb = fc.date();
   const titleArb = fc.string();
   const completedDateArb = fc.date();
   const taskArb = fc.record({
       created: createdDateArb,
       title: titleArb,
       completed: completedDateArb,
   });
}

The fc.record() method also lets us specify which keys are required:

const genTodo = () => {
    const createdDateArb = fc.date();
    const titleArb = fc.string();
    const completedDateArb = fc.date();
    const taskArb = fc.record(
        {
            created: createdDateArb,
            title: titleArb,
            completed: completedDateArb,
        },
        {requiredKeys: ['created', 'title']}
    );
}

We’re nearly done with our task arbitrary. But we might want to restrict it a little. You see, in theory, we should never have a ‘completed’ date that happens before a ‘created’ date. It would be nice if we could model this in our sample values.

To make this possible, fast-check lets us transform generated values using .map(). For our case, we want completed to occur after created. Thus, instead of generating another date for completed, we’ll generate a positive integer. Then, we’ll use .map() to add it to the created date. We’ll also convert our dates to timestamps while we’re at it:

const genTodo = () => {
    const createdDateArb = fc.date();
    const titleArb = fc.string();
    const offsetArb = fc.nat(); // Generate a positive integer
    const taskArb = fc.record(
        {
            created: createdDateArb,
            title: titleArb,
            offset: offsetArb,
        },
        {requiredKeys: ['created', 'title']}
    );
    return taskArb.map(({created, title, offset}) => ({
        created: created.getTime(),
        title,
        completed: offset !== undefined ? created.getTime() + offset : undefined,
    }));
}

And with that, we have a working generator. But, we probably don’t need all those variables. Our final generator can be a little more streamlined:

const genTodo = () => {
    return fc
        .record(
            {
                created: fc.date(),
                title: fc.string(),
                offset: fc.nat(),
            },
            {requiredKeys: ['created', 'title']}
        )
        .map(({created, title, offset}) => ({
            created: created.getTime(),
            title,
            completed: offset !== undefined ? created.getTime() + offset : undefined,
        }));
};

Once we’ve got a generator for a task, it’s not too hard to create an arbitrary for the state, using fc.array() and fc.record():

const genTaskState = () =>
    fc.record({
        active: fc.array(genTodo()),
        archive: fc.array(genTodo()),
    });

We can now generate random input data. But we don’t have any tests yet. If we’re not coming up with examples, how do we write the test?

How do we work out what properties to test?

When we’re writing example-based tests, people often recommend using a Gherkin-like template. They look something like this:

GIVEN <some input and starting conditions>
WHEN <we call some function or take some action>
THEN <some condition SHOULD be true>

In this template, we come up with some starting state. Then we describe the action, and some expected result. Often, the condition is that the actual output should match some expected output. (Though not always). BDD proponents also suggest it’s a good idea to include the word SHOULD in the final clause.

When it comes to writing property tests, we change the template a little. We use something more like the following:

GIVEN ANY <arbitrary inputs, conforming to certain restrictions>
WHEN <we call some function or take some action>
THEN <some condition SHOULD ALWAYS hold>

Let’s go through those line-by-line.

  • GIVEN ANY <arbitrary inputs, conforming to certain restrictions>: We include the word ANY to remind us that we’re expecting a range of random inputs. This doesn’t mean that we throw every possible JS value at the function. Rather, we throw anything we might reasonably expect. If we’re using TypeScript, a function’s type signature specifies what we consider ‘reasonable’. If we’re working in plain JS, we use common sense. In other tutorials, you may see this written as FOR ALL <inputs> SUCH THAT <some conditions hold>. The general idea is the same though.
  • WHEN <we call some function or take some action>: This line remains much the same. Given some input data, we call our function under test (or take some other action).
  • THEN <some condition SHOULD ALWAYS hold>: The final part describes some property we expect to be true. To emphasise that we’re working with ranges of data though, it helps to include the word ALWAYS or NEVER.

What might we write for our archive function then? Well, here we need to think about what our function is doing. We start with a bunch of tasks, and move them around. A good thing to check might be that we don’t lose any tasks in the moving-around process. We could check that the total number of tasks in state stays the same. Putting that into our template, we get:

GIVEN ANY valid task state and date
WHEN we run moveOldTasksToArchive()
THEN the total number of tasks SHOULD ALWAYS stay the same

Using the same template, we can think of some other properties too. For example, archiving should never modify any of the tasks. A test that describes this property might be:

GIVEN ANY valid task and date
WHEN we run moveOldTasksToArchive()
THEN there SHOULD NEVER be any tasks in the archive
     that weren't in the original state

This is good, but still hasn’t addressed the main thing we want our task to do though. After we’ve run moveOldTasksToArchive(), we want all the old tasks to be moved out of active. We can write a property for that too:

GIVEN ANY valid task and date
WHEN we run moveOldTasksToArchive()
THEN all the tasks in .active SHOULD ALWAYS be either 
     incomplete, or, completed less than 60 seconds
     before the date

Those three descriptions give us good coverage of how moveOldTasksToArchive() should work. Some people like to go a bit further and write more mathematical style descriptions. For us though, what we’ve got is enough to write some property tests.

Writing a property test

With fast-check, we define a property using the fc.property() method. It takes a number of arbitraries as arguments. But it always expects the last argument to be a function that runs the test. For our case, it might look something like the following:

const lengthProperty = fc.property(genTaskState(), fc.date(), (s, dt) => {
    const newState = moveOldTasksToArchive(s, dt.getTime());
    const actualLength = newState.active.length + newState.archive.length;
    const expectedLength = s.active.length + s.archive.length;
    expect(actualLength).toBe(expectedLength);
});

Here, the first argument we pass is our task state generator from above. It generates a valid set of active and archived tasks. We also pass it a date that represents the ‘current time’. Then, in the final argument, we pass a test function. This function receives the generated values and checks that our property holds. In this case, we use Jest’s built-in expect() function.

To test our property, we pass it to fc.assert(). It does the work of running the tests. It also lets us specify some parameters, like how many examples to generate. For this first test, we’ll tell it to run 10000 tests, so we can be sure our code is solid:

fc.assert(lengthProperty, {numRuns: 10000});

Putting that all together inside a Jest describe() block, we get:

describe('moveOldTasksToArchive()', () => {
    test(`GIVEN ANY valid task state and date
    WHEN we run moveOldTasksToArchive()
    THEN the total number of tasks SHOULD ALWAYS stay the same`, () => {
        const lengthProperty = fc.property(genTaskState(), fc.date(), (s, dt) => {
            const newState = moveOldTasksToArchive(s, dt.getTime());
            const actualLength = newState.active.length + newState.archive.length;
            const expectedLength = s.active.length + s.archive.length;
            expect(actualLength).toBe(expectedLength);
        });
        fc.assert(lengthProperty, {numRuns: 10000});
    });
});

And, when we run the test… it fails!

failing-property-test.png

Decoding property test output

The failure message may look a little intimidating at first. But if we can decode it, there’s a lot of useful information. there The first thing it tells us is that it failed after just one test.

Property failed after 1 tests

On its own, that’s not the most useful piece of information. But it’s more helpful if we understand how fast-check generates examples.

We know that property-test frameworks, like fast-check, produce random example values. But if you think about it, there are a lot of possible values it could generate. But, we also know that bugs tend to occur around edge cases. That is, we’ll find more bugs associated with -1, 0, and 1, than we will associated with 42 or 6168533449859237. In general, smaller values tend to find more bugs.

Recognising this, fast-check biases its example generation. Early on in the run, it’s weighted to produce small values more frequently. That is, it’s more likely to try things like 0, [], undefined, empty strings, and so on. But, as the test run continues, it will produce larger values to make sure that it gives good coverage.

With this in mind, we can interpret that first line: Property failed after 1 tests. Since we know that fast-check usually tries small values early on, it’s probably found an edge case. Perhaps something to do with empty arrays, undefined values, or early dates.

Reproducing failing tests

Back to decoding the test output. The next line in the failed test report was:

{ seed: 1383591766, path: "0:1:0:1:1:1:1:1", endOnFailure: true }

This line may seem cryptic, but it’s most helpful. You see, the values that fast-check generates are not completely random. They’re pseudorandom values. This means that if we provide fast-check with a seed, it can replay a test run. When we go back to our code and fix the function, we can run those same tests again to see if we fixed the problem. For example:

    fc.assert(lengthProperty, {seed: 1383591766});

This will replay all the generated values. If we only want to replay the failing test, we pass in the path value like so:

    fc.assert(
        lengthProperty,
        {seed: 1383591766, path: "0:1:0:1:1:1:1:1"}
    );

The next line after the seed and path gives us a counterexample. That is, it shows us some sample values it found will break our test.

Counterexample: [{"active":[{"created":0,"title":"","completed":undefined}],"archive":[]},new Date("1970-01-01T00:00:00.000Z")]

If we reformat the counterexample a little, it’s easier to read:

[
    {
        active: [{
            created: 0,
            title: '',
            completed: undefined,
        }],
        archive: [],
    },
    new Date('1970-01-01T00:00:00.000Z'),
]

This tells us that the test failed with a single active task, and no archive tasks. And the active task happened to be incomplete. It also had an empty title and a created timestamp of zero. With a failing case, we can examine our code and determine why it broke. We’ll come back and do that in a moment. For now, we’ll keep examining the test output.

If we wanted to replay this example, or even tweak it a little, fast-check provides a way to do that. When we call fc.assert(), we can pass an array of examples we want it to try every single time. This is handy if there are specific edge cases we want to check.

Using it might look like so:

const incompleteTaskExample = [
    {
        active: [{
            created: 0,
            title: '',
            completed: undefined,
        }],
        archive: [],
    },
    new Date('1970-01-01T00:00:00.000Z'),
];
fc.assert(lengthProperty, {examples: [incompleteTaskExample]});

The examples property takes an array, since we may want to test lots of examples.

Shrinking

The next line in the test output reads:

Shrunk 7 time(s)

This tells us is that the example above isn’t the first failure fast-check found. But the first failing example might have had hundreds of tasks in it. With hundreds of values in an array, it’s difficult to tell which one is causing the problem. To help us out, property-testing frameworks (like fast-check) try to shrink failing examples. When it finds a failing case, it will tweak the example and run it again. And the tweaks will be things like:

  • If the input was a number, try a number closer to zero;
  • If the input was an array, try an array with fewer items;
  • If the input was a string, try a shorter string;
  • Try undefined, if that’s an allowable value.

It will keep tweaking the inputs until the tests start passing again or it can’t shrink the values any more. This way, the framework finds the simplest possible failing case. Most of the time, this makes it easier to understand what’s going on, and hence fix our code.

Speaking of fixing the code, let’s get our archive function working.

Fixing our code

The test suite generated an example with a single, incomplete task in the active array. Our test is failing because the archive code doesn’t handle incomplete tasks. Here’s our function again:

const moveOldTasksToArchive = ({active, archive}, currentTime) => ({
    active: active.filter(({completed}) => currentTime - completed < ONE_MINUTE),
    archive: active.filter(({completed}) => currentTime - completed >= ONE_MINUTE).concat(archive),
});

What happens if we encounter an incomplete task? An incomplete task has an undefined completed date. So our filter function tries to subtract undefined from the current date (in this case, zero). And it gets back NaN. The comparison NaN < ONE_MINUTE returns false. So .filter() removes the task from the array. But in the next filter, NaN >= ONE_MINUTE also returns false. And our task is lost forever.

So, let’s adjust our code to handle incomplete tasks. And while we’re at it, those two functions we pass to .filter() are rather similar. Let’s factor that out into a couple of utility functions:

// Take a function and transform it so that it returns the boolean
// negation.
const not = f => x => !f(x);

// Take the current time and a task, and determine if this is an
// old task that should be archived.
const isOldTask = currentTime => task => {
    return task.completed !== undefined &&
        currentTime - task.completed > ONE_MINUTE;
}

With those in place, we can now update our moveOldTasksToArchive() function:

const moveOldTasksToArchive = ({active, archive}, currentTime) => ({
    active: active.filter(not(isOldTask(currentTime))),
    archive: active.filter(isOldTask(currentTime)).concat(archive),
});

And with that in place, our test passes.

Now we’ve got that working, let’s add in our final two property tests:

    test(`GIVEN ANY valid task and date
        WHEN we run moveOldTasksToArchive()
        THEN there SHOULD NEVER be any tasks in the archive that weren't in the original state`, () => {
        const noNewTasksProperty = fc.property(genTaskState(), fc.date(), (s, dt) => {
            const {archive} = moveOldTasksToArchive(s, dt.getTime());
            expect(archive.every(task => s.archive.includes(task) || s.active.includes(task))).toBe(
                true
            );
        });
        fc.assert(noNewTasksProperty, {numRuns: 10000});
    });

    test(`GIVEN ANY valid task and date
        WHEN we run moveOldTasksToArchive()
        THEN all the tasks in .active SHOULD ALWAYS be either 
            incomplete, or, completed less than 60 seconds
            before the date`, () => {
        const allActiveRecentProperty = fc.property(genTaskState(), fc.date(), (s, dt) => {
            const newState = moveOldTasksToArchive(s, dt.getTime());
            expect(
                newState.active.some(
                    ({completed}) => completed !== undefined && dt - completed > ONE_MINUTE
                )
            ).toBe(false);
        });
        fc.assert(allActiveRecentProperty, {numRuns: 10000});
    });

When we run these tests, they pass. And once they’ve passed I like to tune down the numRuns parameter. Usually, I’ll set it back to the default 100 (sometimes, even lower). It’s OK to run tens of thousands of tests on my laptop. But once I commit my changes, there’s no reason our CI/CD system needs to run that many tests on every commit. Around 100 is usually enough to catch regressions.

We’ve seen how to write property-based tests. But the thought of throwing lots of random data at our code often makes people nervous. Having more tests doesn’t always equal better outcomes.

Is property testing bad practice?

Conventional wisdom in the front-end world has us moving away from running lots of unit tests. Guillermo Rauch’s tweet has become something of a mantra:

Write tests. Not too many. Mostly integration.

Kent C. Dodds picked this up and ran with it, developing it into the ‘testing trophy’ concept.

Kent C. Dodds identifies four kinds of tests: End to end, integration, unit, and static. End to end tests usually involve some kind of tool clicking around the app in  a real browser. Integration tests verify that several units work together. Unit tests verify that individual, isolated parts work as expected. And static checkers catch type errors and typos—often in your code editor.
Kent C. Dodds identifies four kinds of tests: End to end, integration, unit, and static. End to end tests usually involve some kind of tool clicking around the app in a real browser. Integration tests verify that several units work together. Unit tests verify that individual, isolated parts work as expected. And static checkers catch type errors and typos—often in your code editor.

Now, at first glance, you might think property-based testing goes against conventional wisdom. Instead of a handful of unit tests, we’re suddenly running hundreds or thousands of tests. Won’t this make refactoring difficult? As a colleague of mine commented:

My worry is that introducing property-based testing brings us back to a world where we have very rigid tests, that stifle ongoing development on components.

This is a reasonable concern. But let’s be clear about why we want to avoid having lots of small tests. We want to avoid testing implementation details. That is, we don’t want to over-specify our tests. Doing so wastes time and CPU cycles checking things that don’t matter. Or worse, fixing broken tests that never tested anything useful in the first place.

Counter to what you might expect, property tests make it harder to over-specify tests.

How does that work? Well, what does it mean to avoid over-specifying tests? It means not testing things we don’t care about. Think back to our example for a moment. Let’s suppose that we don’t care about the order that tasks go into the archive. We may care about ordering in future, if we discover that users care about it. But for now, we don’t. So, if we change the order that items go into the archive, our tests should not fail.

Let’s try it out. We change our function so new tasks are added to the end of the archive:

const moveOldTasksToArchive = ({active, archive}, currentTime) => ({
    active: active.filter(not(isOldTask(currentTime))),
    archive: archive.concat(active.filter(isOldTask(currentTime))),
});

And when we run our tests… the Populated archive example test fails.

failing-example-based-test.png

The example implicitly specifies that the archived items must be in a particular order. Even though we don’t care, it’s still checking.

Now, to be fair, it’s possible to fix the example based tests. Instead of checking that the output matches an expected value, we could check that all the completed items in active are less than 60 seconds old. Except, that’s almost identical to the property test we’ve already written. And the property tests also make sure that we haven’t lost any tasks in the process. If we update the example tests, we end up writing a property test with manual data generation.

The point here isn’t to say that unit tests are bad. Rather, property tests are different. They take more effort to write because we have to think harder. But that extra effort tends to result in better tests with more coverage of things that matter.

I’ll be the first to admit that property tests can be expensive. They take longer to write. They take longer to run. There are times when we shouldn’t us property tests (more on that, soon). But I find the return on investment is worth it.

The nice thing about property tests is that they combine the best bits of integration tests, end-to-end tests, and unit tests. Like integration/end-to-end tests, property tests encourage us to think through what’s really important. But like unit tests, they allow us to make those checks at lower levels of the code and cover lots of different cases quickly. (For all that they’re slow, property tests are still faster than an end-to-end test). And that gives us more confidence in our code.

If you’d like to learn more about property-based tests, I’ve listed a few good references below:

Finally, you can find the code I used to write this tutorial on GitHub.


  1. Yes, yes. I know. Eye–roll. To do applications are over–used as examples. But there’s a reason for that. They have just enough complexity to look like a ‘real life’ application. But they do that without being so complex as to swamp us with irrelevant details. So we'll put up with it for now.  ↩︎

Free Cheat Sheet

Ever forget which JavaScript array method does what? Let the Civilised Guide to JavaScript Array Methods gently guide you to the right one. It’s free for anyone who subscribes to receive updates.

Acquire your copy

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK