2

Cross-Branch Testing

 3 years ago
source link: https://buttondown.email/hillelwayne/archive/cross-branch-testing/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Cross-Branch Testing

12/22/2020

This was inspired by a few conversations I had last week. There’s a certain class of problems that are hard to test:

  1. The output isn’t obviously inferrable from the input. The code isn’t just doing what a human could abstractly do, it’s doing something where we don’t know the answer without running the program.
  2. The output doesn’t have “mathematical” properties, like round-tripness or commutativity.
  3. Errors in the function can be “subtle”: there can be a bug that affects only a small subset of possible inputs, so that a set of individual test examples would still be correct.

Some examples of this: computer vision code, simulations, lots of business logic, most “data pipelines”. I was struggling to find a good semantic term for this class of problems, then gave up and started mentally calling them “rho problems”. There’s no meaning behind the name, just the first word that popped into my head.

Most rho problems are complex, such that providing a realistic example would take too long to set up and explain. Here’s instead a purely contrived rho problem.

def f(x, y, z):
  out = 0
  for i in range(10):
    out = out * x + abs(y*z - i**2)
    x, y, z = y+1, z, x
  return abs(out)%100 < 10

Not code you’d see in the real world, but it has all the properties of a rho problem. Imagine this is the code in our deploy branch, and we try to refactor this code on a new branch. The refactor introduces a subtle bug that affects only 1% of possible inputs, which we’ll simulate with this change:

def f(x, y, z):
  out = 0
  for i in range(10):
    out = out * x + abs(y*z - i**2)
    x, y, z = y+1, z, x
- return abs(out)%100 < 10
+ return abs(out)%100 < 9

What kind of test would detect this error? Standard options are:

  • Unit testing. Problem is that if the error is uniformly distributed then each unit test only has a 1%-ish chance of finding the bug. We need to test a lot more inputs.
  • Property testing. Doesn’t work because we don’t have easily inferable mathematical properties.
  • Metamorphic Testing. There’s no obvious relationship between inputs and outputs. Knowing the value of f(1, 1, 1) tells us little about f(1, 1, 2).

(Yes, I know we could try decomposing the function and testing the parts individually. That’s not the point: we’re trying to find general principles here regardless of the problem.)

But we do have a “correct” reference point: the deploy branch! We can use property testing where the property is that our refactored code always gives the same output as our regular code.1 Step one, use git worktree:

git worktree add ref deploy

That creates a new ref folder in our directory that’s identical to our deploy branch. And by “identical”, I mean that you’re effectively on the deploy branch when in that folder, to the point of being able to commit. Worktree is pretty cool! Anyway, if our code is f, our deploy version is ref.f. We can write a property test (using hypothesis and pytest):

from hypothesis import given
from hypothesis.strategies import integers

import ref.foo as ref
import foo


@given(integers(), integers(), integers())
def test_f(a,b,c):
    assert ref.f(a,b,c) == foo.f(a,b,c)

Running this gives us an erroneous input:

Falsifying example: test_f(
    a=5, b=6, c=5,
)

This is nondeterministic: I’ve also gotten (0, 4, 0) and (-114, -114, -26354) as error cases. Regardless, I still think it works as a proof of concept. We find an error case where the original code and the refactor diverge. Main blocker I see is that version control systems aren’t designed for this kind of use case. Before someone told me about git worktree I had pyest setup code manually change between branches and reload the module, which is… not exactly usable in production. But this seems close enough to something that a sufficiently-motivated person could pull it off.

You just read issue #99 of Computer Things.
Computer Things is brought to you by Buttondown, the best way to start and grow your newsletter.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK