Patching our patching with Property Based Testing Link to heading

I had great results using property-based testing, or PBT to verify an alternative implementation recently.

We had an existing implementation of something, and needed to make it faster, while still being correct. After considering our options, we realized that this is a perfect situation to use PBT.

I’ve gotten good results from using PBT in other projects, and was happy to find a good reason to introduce it in my current project, which proved very useful. Here’s what happened;

Background Link to heading

We have a situation where, if we simplify things down to the absolute minimum, we have a web application displaying a dataset in an Excel-like grid, with live updates to the data being sent from the backend.

Every time the backend sent us data changes (rows added, deleted or modified, or a new dataset to be displayed) our data layer API applied these changes, so we always had a correct view of the dataset contents with a minimum of data transfer.

As an example, the backend could send us a message that gives us an initial data source with three rows, then later send another message that changes the first row, and then a message that removes the second row and also adds another row, and so on.

The existing implementation Link to heading

The existing API for connecting the dataset to the grid rows looked like this;

bindRows(gridComponent: GridComponent): void

which ensured that the grid was kept up-to-date, and it was implemented more or less like this;

// we update the gridComponent rows on each change
bindRows(gridComponent: GridComponent) {
    dataLayer.onRowsChanged(rows => gridComponent.rows = rows))
}

In effect, on each change received from the backend, the data layer first applied the changes, then gave us a new array with the current data, which we then sent to the grid for display. This worked just fine for a long while.

Then, we got into a situation where the backend data format wasn’t quite what we wanted to display in the grid. We needed to transform the rows before displaying them in the grid. Doing the naive change;

dataLayer.onRowsChanged(rows => gridComponent.rows = rowTransform(rows)))

turned out to be quite costly for big data sources, locking up the UI several seconds, even for the smallest change. Since our data layer was responsible for applying the changes from the backend, and didn’t expose further information of which rows where changed or new, we always needed to transform all rows in the dataset on every change.

Side note: Even though the data layer already received changes and applies those to a local copy of the dataset, our data layer follows the Redux pattern, which among other things means the data layer state is immutable. Managing a mutable version of the dataset thus needed to be done outside of the data layer.

The solution Link to heading

To make this more performant, we wanted to add a mutable version of the same API, which from the outer perspective would look like;

bindRowsMutable(gridComponent: GridComponent, rowTransform: (changedRows: InputRow[]) => GridRow[]): void

Where the important change to the implementation was that we gave the grid an initial dataset, and then kept mutating it, which meant we could have the rowTransform only run over new or changed rows. This made everything much more efficient when just a few rows were changed - a very common case.

Trying to efficiently keep track of what was changed wasn’t that easy, which meant that the implementation was far from “obviously correct”, and the tests we had could only cover a fraction of the possible interactions, as the backend will send us an conceptually infinite stream of changesets, each with one or more individual changes.

Using Property Based Tests Link to heading

We had an existing-but-slow implementation, and worked on a experimental-but-fast implementation that should “do the same thing” based on a complex series of simple inputs. This is a problem well-suited for PBT.

In our case, the property we want to assert is that “a grid with rows bound using the existing method should always have the same contents as another grid with rows bound using the experimental method, given the same set of changes applied to both”.

We got very good results implementing this using fast-check and its capability to do model-based testing.

Model-based Testing Link to heading

In model-based testing, a feature of some property-based testing libraries, the basic idea is that some aspect of a complicated thing (say the contents of a table in multithreaded SQL database server) can be represented using a simple model (say just an in-memory dictionary). Given a set of commands (say CRUD) that we perform against the real system, and the model, we could compare the two to find issues.

We implemented the following commands, corresponding to the ways the data source could be changed;

Command	Description
SetDatasourceCommand	add a pending change to set an entirely new array as the data source
AppendRowCommand	add a pending change to append a row to the data source
RemoveRowCommand	add a pending change to remove a row from the data source
ChangeRowCommand	add a pending change to modify a field on one of the data rows in the array
ApplyChangesCommand	collect all pending changes into a changeset, apply it to the data and compare the results

The reason only the last command actually verifies the result, is that otherwise all changesets would contain a single change.

Side note: We could possibly have had a single ApplyRandomChangesetCommand, but then a lot of complexity would have moved into asking fast-check to generate a random but valid changeset of various sizes. Instead we have commands, where the inputs are quite simple, and each (except the last) adds a single type of change to the set of pending changes, which are then applied as a whole in ApplyChangesCommand.

Our model and system under test Link to heading

In our case we use the model to keep track of the expected dataset content, but then apply the commands to two implementations we have available, and verify two things: both that the two implementations generate the same results, and that they match our model (or the same bug might exist in both implementations)

type Model = {
  entities: Set<string>;
  array?: Entity[];
  pendingChanges: PropertyChange[];
};

The System type (details not shown) represents the system under test, and has methods to apply the changes, and fetch the rows for either the existing or experimental implementation for comparison.

The RemoveRowCommand Link to heading

If we look at the RemoveRowCommand, you will get a good idea how the rest of the commands look.

There’s a run method where the command gets a chance to impact the simplified model and the system under test, and detect discrepancies. Our RemoveRowCommand does not verify anything about the system just yet as we are just adding pending changes, and it only modifies the model for command validation purposes.

As there’s randomness involved, not all commands generated by fast-check is valid given the current state of the system. We don’t want to remove a non-existing row, change a row that isn’t present, or append a row using an ID which is already in the dataset because we don’t need to support things that won’t happen in production (let these be my famous last words!). The verification is done by the check method and only commands that return true will actually be performed.

class RemoveRowCommand implements fc.Command<Model, System> {
  // rows are removed by id
  constructor(readonly id: string) {}
  // only allow removal of row actually in the array
  check = (m: Readonly<Model>) => Boolean(m.array && m.array.some(elem => elem.id == this.id));
  run(m: Model, _s: System): void {
    // no need to impact the system just yet

    // impact the model, add the row removal to the pending changes
    m.pendingChanges.push({ Action: "REMOVE", Data: this.id, Target: `dataSource` });
    // remember that the array no longer contains this element
    m.array = m.array!.filter(elem => elem.id != this.id);
  }
  toString = () => `array.remove(${this.id})`;
}

The ApplyPatchCommand Link to heading

It could also be interesting to look at the ApplyPatch command as it is the one doing all the verification, it is not much more complicated;

class ApplyPatchCommand implements fc.Command<Model, System> {
  constructor() {}
  // allow any changeset, even empty ones
  check = (m: Readonly<Model>) => true;
  run(m: Model, s: System): void {
    // impact the system, create the change set from the pending changes
    const changeset = createChangeset(m.pendingChanges);
    // and apply the changeset
    s.apply(changeset);

    // now verify the effects
    const rows = s.getRows();
    const mutableRows = s.getMutableRows();

    // our two implementation should always be equal
    expect(mutableRows).toEqual(rows);
    // and should correspond to the model as well
    expect(rows).toEqual(m.array);

    // impact the model, we have no pending changes after an apply
    m.pendingChanges = [];
  }
  toString = () => `apply patch`;
}

Test setup Link to heading

We setup fast-check to generate a random sequence of commands - an example could be

set datasource to [{id: 3, value: 0}, {id: 2, value: 1}],
apply,
change id: 3 value to 1,
append {id: 4, value: 5},
apply,
remove id 2

While running, the (apply)commands verifies that the existing and experimental results are the same (and matches our model) for the sequence of changes that was generated, and we ask fast-check to repeat this 1000 times.

Side note: In our case, any commands trailing the last apply won’t have any chance of failing the test. I don’t know of a way to always append an “ApplyPatchCommand” command before execution.

When a model based test discovers a discrepancy at any step, fast-check will give you the following three things - and your testing framework will display its usual description of the error:

It tells us which particular sequence of commands that led up to the failure, giving a counterexample when your property does not hold.
It ‘shrinks’ the failure, i.e. tries hard to remove any parts of the command sequence not needed to reproduce the failure, giving you the minimal counterexample (or close).
It gives you enough information to re-generate the same command sequence, so it is only a matter of copy-paste the seed and other information into the test setup, to set up a test to find the same failure again, without you having to manually write a repro case. This is essential for debugging the failure as you don’t want to see hundreds of irrelevant command sequences, like you would if you just relied on randomness to find the bug again.

If this test passes, however, it gives us confidence that the experimental implementation gets us the same end result as the existing implementation, as well as matching the model.

What we discovered Link to heading

The first day we ran these tests, we found several bugs, and the most interesting ones were:

When we got an initial set of rows, and then appended a row, the new row was duplicated in the experimental result - but only if both changes were in the same changeset.
When we got an initial set of rows, then changed a row, then appended a row, the changed row unexpectedly reverted to its previous value in the existing result.

The duplicated element was due to sharing an array instance were we should have copied it, and we had no existing tests that did both set and append in the same changeset - as that scenario is unlikely to happen in real life (but not impossible), nobody thought of it.

The change that was reverted was certainly unexpected, because it was an issue in the existing implementation which had no known bugs, but a change a few weeks back had introduced this bug without any existing tests noticing, as we (apparently) did not have any tests covering the offending series of changes (set + change + append), although we had set+change and set+append. Our new PBT tests covers these, and many other cases.

Conclusion Link to heading

The beauty of introducing randomness into testing, is that it has the ability to surprise you. Every time I use it, I find bugs from odd and unexpected cases that no-one on the team had managed to write unit tests for, and which would have taken ages to track down even if you got an error report. In my opinion, it is not easy to come up with example based tests that surprise yourself (and your code), let the machine do that for you.

Example based testing is a good verification strategy, but is not a very good search strategy as it just verifies the same examples every time. PBT on the other hand is a search strategy based on randomizing the input, exploring different variations on each run. It’s good to know both.

I think many developers, perhaps even you, could improve their testing abilities if they got familiar with PBT using libraries such as:

fast-check for JavaScript/TypeScript
CsCheck for dotnet
Hypothesis for Python

For more information, including a comparison of PBT testing features for frameworks for many languages, I recommend looking at this overview of property-based testing functionality or watch my 2023 presentation “Surprise yourself with Property Based Testing” at the Øredev developers conference, where I talk more about PBT, fast-check and model based testing, with a more detailed example, and give some useful advice on coming up with properties, which is often a hurdle when new to PBT.