
Patching our patching with Property Based Testing Link to heading
I had great results using property-based testing, or PBT to verify an alternative implementation recently.
We had an existing implementation of something, and needed to make it faster, while still being correct. After considering our options, we realized that this is a perfect situation to use PBT.
I’ve gotten good results from using PBT in other projects, and was happy to find a good reason to introduce it in my current project, which proved very useful. Here’s what happened;
Background Link to heading
We have a situation where, if we simplify things down to the absolute minimum, we have a web application displaying a dataset in an Excel-like grid, with live updates to the data being sent from the backend.
Every time the backend sent us data changes (rows added, deleted or modified, or a new dataset to be displayed) our data layer API applied these changes, so we always had a correct view of the dataset contents with a minimum of data transfer.
As an example, the backend could send us a message that gives us an initial data source with three rows, then later send another message that changes the first row, and then a message that removes the second row and also adds another row, and so on.
The existing implementation Link to heading
The existing API for connecting the dataset to the grid rows looked like this;
bindRows(gridComponent: GridComponent): void
which ensured that the grid was kept up-to-date, and it was implemented more or less like this;
// we update the gridComponent rows on each change
bindRows(gridComponent: GridComponent) {
dataLayer.onRowsChanged(rows => gridComponent.rows = rows))
}
In effect, on each change received from the backend, the data layer first applied the changes, then gave us a new array with the current data, which we then sent to the grid for display. This worked just fine for a long while.
Then, we got into a situation where the backend data format wasn’t quite what we wanted to display in the grid. We needed to transform the rows before displaying them in the grid. Doing the naive change;
dataLayer.onRowsChanged(rows => gridComponent.rows = rowTransform(rows)))
turned out to be quite costly for big data sources, locking up the UI several seconds, even for the smallest change. Since our data layer was responsible for applying the changes from the backend, and didn’t expose further information of which rows where changed or new, we always needed to transform all rows in the dataset on every change.
Side note: Even though the data layer already received changes and applies those to a local copy of the dataset, our data layer follows the Redux pattern, which among other things means the data layer state is immutable. Managing a mutable version of the dataset thus needed to be done outside of the data layer.
The solution Link to heading
To make this more performant, we wanted to add a mutable version of the same API, which from the outer perspective would look like;
bindRowsMutable(gridComponent: GridComponent, rowTransform: (changedRows: InputRow[]) => GridRow[]): void
Where the important change to the implementation was that we gave the grid an initial dataset, and then kept mutating it, which meant we could have the rowTransform only run over new or changed rows. This made everything much more efficient when just a few rows were changed - a very common case.
Trying to efficiently keep track of what was changed wasn’t that easy, which meant that the implementation was far from “obviously correct”, and the tests we had could only cover a fraction of the possible interactions, as the backend will send us an conceptually infinite stream of changesets, each with one or more individual changes.
Using Property Based Tests Link to heading
We had an existing-but-slow implementation, and worked on a experimental-but-fast implementation that should “do the same thing” based on a complex series of simple inputs. This is a problem well-suited for PBT.
In our case, the property we want to assert is that “a grid with rows bound using the existing method should always have the same contents as another grid with rows bound using the experimental method, given the same set of changes applied to both”.
We got very good results implementing this using fast-check and its capability to do model-based testing.
Model-based Testing Link to heading
In model-based testing, a feature of some property-based testing libraries, the basic idea is that some aspect of a complicated thing (say the contents of a table in multithreaded SQL database server) can be represented using a simple model (say just an in-memory dictionary). Given a set of commands (say CRUD) that we perform against the real system, and the model, we could compare the two to find issues.
We implemented the following commands, corresponding to the ways the data source could be changed;
Command | Description |
---|---|
SetDatasourceCommand | add a pending change to set an entirely new array as the data source |
AppendRowCommand | add a pending change to append a row to the data source |
RemoveRowCommand | add a pending change to remove a row from the data source |
ChangeRowCommand | add a pending change to modify a field on one of the data rows in the array |
ApplyChangesCommand | collect all pending changes into a changeset, apply it to the data and compare the results |
The reason only the last command actually verifies the result, is that otherwise all changesets would contain a single change.
Side note: We could possibly have had a single ApplyRandomChangesetCommand, but then a lot of complexity would have moved into asking fast-check to generate a random but valid changeset of various sizes. Instead we have commands, where the inputs are quite simple, and each (except the last) adds a single type of change to the set of pending changes, which are then applied as a whole in ApplyChangesCommand.
Our model and system under test Link to heading
In our case we use the model to keep track of the expected dataset content, but then apply the commands to two implementations we have available, and verify two things: both that the two implementations generate the same results, and that they match our model (or the same bug might exist in both implementations)
type Model = {
entities: Set<string>;
array?: Entity[];
pendingChanges: PropertyChange[];
};
The System type (details not shown) represents the system under test, and has methods to apply the changes, and fetch the rows for either the existing or experimental implementation for comparison.
The RemoveRowCommand Link to heading
If we look at the RemoveRowCommand, you will get a good idea how the rest of the commands look.
There’s a run
method where the command gets a chance to impact the simplified model and the system under test, and detect discrepancies. Our RemoveRowCommand does not verify anything about the system just yet as we are just adding pending changes, and it only modifies the model for command validation purposes.
As there’s randomness involved, not all commands generated by fast-check is valid given the current state of the system. We don’t want to remove a non-existing row, change a row that isn’t present, or append a row using an ID which is already in the dataset because we don’t need to support things that won’t happen in production (let these be my famous last words!). The verification is done by the check
method and only commands that return true will actually be performed.
class RemoveRowCommand implements fc.Command<Model, System> {
// rows are removed by id
constructor(readonly id: string) {}
// only allow removal of row actually in the array
check = (m: Readonly<Model>) => Boolean(m.array && m.array.some(elem => elem.id == this.id));
run(m: Model, _s: System): void {
// no need to impact the system just yet
// impact the model, add the row removal to the pending changes
m.pendingChanges.push({ Action: "REMOVE", Data: this.id, Target: `dataSource` });
// remember that the array no longer contains this element
m.array = m.array!.filter(elem => elem.id != this.id);
}
toString = () => `array.remove(${this.id})`;
}
The ApplyPatchCommand Link to heading
It could also be interesting to look at the ApplyPatch command as it is the one doing all the verification, it is not much more complicated;
class ApplyPatchCommand implements fc.Command<Model, System> {
constructor() {}
// allow any changeset, even empty ones
check = (m: Readonly<Model>) => true;
run(m: Model, s: System): void {
// impact the system, create the change set from the pending changes
const changeset = createChangeset(m.pendingChanges);
// and apply the changeset
s.apply(changeset);
// now verify the effects
const rows = s.getRows();
const mutableRows = s.getMutableRows();
// our two implementation should always be equal
expect(mutableRows).toEqual(rows);
// and should correspond to the model as well
expect(rows).toEqual(m.array);
// impact the model, we have no pending changes after an apply
m.pendingChanges = [];
}
toString = () => `apply patch`;
}
Test setup Link to heading
We setup fast-check to generate a random sequence of commands - an example could be
- set datasource to [{id: 3, value: 0}, {id: 2, value: 1}],
- apply,
- change id: 3 value to 1,
- append {id: 4, value: 5},
- apply,
- remove id 2
While running, the (apply)commands verifies that the existing and experimental results are the same (and matches our model) for the sequence of changes that was generated, and we ask fast-check to repeat this 1000 times.
Side note: In our case, any commands trailing the last apply won’t have any chance of failing the test. I don’t know of a way to always append an “ApplyPatchCommand” command before execution.
When a model based test discovers a discrepancy at any step, fast-check will give you the following three things - and your testing framework will display its usual description of the error:
- It tells us which particular sequence of commands that led up to the failure, giving a counterexample when your property does not hold.
- It ‘shrinks’ the failure, i.e. tries hard to remove any parts of the command sequence not needed to reproduce the failure, giving you the minimal counterexample (or close).
- It gives you enough information to re-generate the same command sequence, so it is only a matter of copy-paste the seed and other information into the test setup, to set up a test to find the same failure again, without you having to manually write a repro case. This is essential for debugging the failure as you don’t want to see hundreds of irrelevant command sequences, like you would if you just relied on randomness to find the bug again.
If this test passes, however, it gives us confidence that the experimental implementation gets us the same end result as the existing implementation, as well as matching the model.
What we discovered Link to heading
The first day we ran these tests, we found several bugs, and the most interesting ones were:
- When we got an initial set of rows, and then appended a row, the new row was duplicated in the experimental result - but only if both changes were in the same changeset.
- When we got an initial set of rows, then changed a row, then appended a row, the changed row unexpectedly reverted to its previous value in the existing result.
The duplicated element was due to sharing an array instance were we should have copied it, and we had no existing tests that did both set and append in the same changeset - as that scenario is unlikely to happen in real life (but not impossible), nobody thought of it.
The change that was reverted was certainly unexpected, because it was an issue in the existing implementation which had no known bugs, but a change a few weeks back had introduced this bug without any existing tests noticing, as we (apparently) did not have any tests covering the offending series of changes (set + change + append), although we had set+change and set+append. Our new PBT tests covers these, and many other cases.
Conclusion Link to heading
The beauty of introducing randomness into testing, is that it has the ability to surprise you. Every time I use it, I find bugs from odd and unexpected cases that no-one on the team had managed to write unit tests for, and which would have taken ages to track down even if you got an error report. In my opinion, it is not easy to come up with example based tests that surprise yourself (and your code), let the machine do that for you.
Example based testing is a good verification strategy, but is not a very good search strategy as it just verifies the same examples every time. PBT on the other hand is a search strategy based on randomizing the input, exploring different variations on each run. It’s good to know both.
I think many developers, perhaps even you, could improve their testing abilities if they got familiar with PBT using libraries such as:
- fast-check for JavaScript/TypeScript
- CsCheck for dotnet
- Hypothesis for Python
For more information, including a comparison of PBT testing features for frameworks for many languages, I recommend looking at this overview of property-based testing functionality or watch my 2023 presentation “Surprise yourself with Property Based Testing” at the Øredev developers conference, where I talk more about PBT, fast-check and model based testing, with a more detailed example, and give some useful advice on coming up with properties, which is often a hurdle when new to PBT.