4

Github proposal: testing: add fuzz test support · Issue #44551 · golang/go · Git...

 3 years ago
source link: https://github.com/golang/go/issues/44551
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

New issue

proposal: testing: add fuzz test support #44551

katiehockman opened this issue 21 days ago · 41 comments

Comments

Copy link

Member

katiehockman commented 21 days ago

This proposal is to add fuzz test support to Go. This will add a new testing.F type, support FuzzFoo functions in _test.go files, and add new go command behavior.

A design draft has already been published and iterated on based on feedback from the Go community. This is the next step to propose that this design draft become a language feature.

This feature will be considered experimental in Go 1.17, and the API will not be covered by the Go 1 compatibility promise yet. The functionality that goes into this release is expected to have bugs and be missing features, but should serve as a proof-of-concept for which Go developers can experiment and provide feedback. Since this will be an experimental feature to start, we also expect there will be room for growth for the mutator and fuzzing engine in future Go releases.

Below are the parts of the design draft which will not make it into 1.17, and will be considered later on:

  • support for fuzzing with -race and -msan
  • support for fuzzing with -keepfuzzing
  • deduplication of similar crashes caused by different mutations, which would be a prerequisite to implementing -keepfuzzing (to reduce noise)
  • allowing special options while fuzzing (e.g. maximum input size)
  • dictionary support
  • customizable coverage instrumentation while fuzzing (e.g. to only instrument certain packages or files)
  • custom generators for the mutator
  • structured fuzzing support for struct and non-primitive types
  • [Stretch goal for 1.17] structured fuzzing support for primitive types other than []byte (e.g. string, int, float64)

Copy link

Contributor

robpike commented 21 days ago

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear. I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven? Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

Apologies if I'm late to the party; I did not comment on the original proposal.

Copy link

Member

FiloSottile commented 21 days ago

edited

How can we keep the fuzzing corpora small, and the fuzzing process efficient and on point? Although fuzzing is good at finding certain classes of bugs, it is very expensive in CPU and storage and cost/benefit ratio remains unclear.

The track record of go-fuzz provides pretty amazing evidence that fuzzing is good at finding bugs that humans had not found. In my experience, just a few CPU minutes of fuzzing can be extremely effective at the margin. There aren't a lot of other ways to turn fixed amounts of CPUs (as opposed to dynamic, runtime checks) into bugs caught.

I worry about wasting energy and filling up git repos with testdata noise. Is there a way to make the testing focused and result-driven?

Yeah, filling repositories with random-looking stuff is definitely undesirable. To be clear, in this design the mutator-generated corpus is not in testdata, we agree that it would be way too noisy to check in.

Samples are only placed in testdata when they resulted in a test failure, and then it's up to the developer to check them in as a regression test, or turn them into a more full-fledged unit test.

Also, will we wake up one day and find fuzzing is a mandatory part of all builds? That would be terrifying.

I'm not sure what you mean with being a part of all builds. If you mean running as part of go build, definitely not.

The expensive, open-ended part of fuzzing won't even be a default part of go test: without the -fuzz flag only seed and testdata corpora will be tested, and the mutator won't run. (That would make tests non-deterministic, aside from resources concerns.)

Copy link

Contributor

robpike commented 21 days ago

edited

Thanks, Filo. Your point about how we store stuff is important, and I missed that.

I understand the value of fuzzing, and my grumpy old man concerns are not an attempt to reduce that. (I will say that some of the bugs it finds are not worth fixing, and certainly not the cost of finding, although it's very hard to make that case.) But I have three general concerns:

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

  2. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

  3. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Go build will surely not run it always, but organizations will. Being automated, companies will require it, just as they require zero lint errors even as we protest that lint is not perfect.

I am just expressing caution, care, and an eye on the big picture here. I am not saying we shouldn't do this, just that we should do it with all considerations in mind.

This is awesome - really excited to see native Go fuzzing in the proposal stage now.

Hopefully it's not too early to make a suggestion for the actual implementation of this, but it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

I think fuzzing is in sort of a unique spot among the supported testing types in that there's no one correct way to fuzz test your code. Having the ability to build a type that implements the testing.F interface would make it simple to swap out fuzzing backends, enabling the community to take advantage of all the awesome open source work out there by building integrations to things like AFL++.

There's already been interest in something like this from the original Reddit thread mentioned on the proposal document (like this comment here), and I think it would be a missed opportunity to not allow developers to hook these functions into property-based testing tools (like Gopter) to get some minimal testing done at the go test stage, while allowing the same test to later be fuzzed to discover deeper bugs.

There are probably details I've missed here and I'd like to hear some other opinions on this, but given all the potential use cases I think it's worth some discussion.

Copy link

Member

FiloSottile commented 21 days ago

  1. A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

Agreed, I like to say that fuzzing finds bugs at the margin. It's why we are interested in it as the Security team: bugs caught at the margin are ones that don't make it into production to become vulnerabilities. Just like it finds bugs that are often not found in other ways, it can't find bugs that other ways can.

We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them, and we'd like to hear it if that content doesn't push in the right direction.

  1. The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.
  2. Its adoption as part of the process. This is where the proposal to incorporate it needs attention. The technical process and the culture must make it clear that (1) and (2) mean that it cannot be part of the standard build process.

Indeed, this is the hardest part: creating a culture in the ecosystem that makes the practice successful. We want fuzzing to be part of the development—not build or security—process: make a change to the relevant code, run go test -fuzz with the locally cached corpus that can already hit most code branches, find a bug quickly, debug-fix-verify with go test, turn the crasher into a unit test or check it in as a regression test.

There is probably also space for fuzzing in CI, like OSS-Fuzz demonstrated and so that the corpus can be centralized, but since it's non-deterministic it can't meaningfully block a build. Hopefully the CLI choices, like not having a fixed fuzzing time, nudge in that direction.

Likewise, I think writing will be the best tool we have to shape usage, and welcome feedback on future drafts!

Copy link

Member

bcmills commented 20 days ago

@everestmz

it would be super useful if the testing.F type was an interface, rather than a struct like testing.T/B are.

Why would such an interface type need to be defined in the testing package itself? Keeping testing.F a concrete type allows for future methods to be added and documented, and packages that want to abstract over different F-like implementations can already defined their own interface for the specific methods they need.

turn the crasher into a unit test or check it in as a regression test.

Absolute wild wild dream, but... wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically? It could be as simple as copying the function, give is a unique name, and replace testData := {source of random data} with testData := {the data that makes the test fail}.

This would make it very clear that fuzzing is generating data that make your test fail, and in itself not a way to test your code. Its a random edg-case generator, not a validator.

Copy link

Member

bcmills commented 20 days ago

edited

The flow of control in the design doc seems a bit off to me. The Fuzz method in particular carries a lot of type-state invariants:

  • “Only one call to Fuzz is allowed per fuzz target, and any subsequent calls will panic.”
  • and, Add “cannot be invoked after or within the Fuzz function.”

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

But then I wonder if there is a cleaner alternative, especially given the (*T).Cleanup methods added in #32111. The restriction to a single Fuzz invocation per test makes me think that it functions more like a constructor-function than a method call on an existing object. Would it make sense to make it a constructor method on a *T, returning an explicit object on which Add could then be called? That would make it clearer that the fuzz function should not call Add, since the user would have to go out of their way to even have the object with the Add method in scope at that point.

I'm imagining something like:

package testing

func (*T) NewFuzzer(ff interface{}) *F
func (*F) Add(args interface{})

That would make the example in the doc more like:

func TestFuzzMarshalFoo(t *testing.T) {
	f := t.NewFuzzer(func(t *testing.T, a string, num *big.Int) {
		t.Parallel() // seed corpus tests can run in parallel
		if num.Sign() <= 0 {
			t.Skip() // only test positive numbers
		}
		val, err := MarshalFoo(a, num)
		if err != nil {
			t.Skip()
		}
		if val == nil {
			t.Fatal("MarshalFoo: val == nil, err == nil")
		}
		a2, num2, err := UnmarshalFoo(val)
		if err != nil {
			t.Fatalf("failed to unmarshal valid Foo: %v", err)
		}
		if a2 == nil || num2 == nil {
			t.Error("UnmarshalFoo: a==nil, num==nil, err==nil")
		}
		if a2 != a || !num2.Equal(num) {
			t.Error("UnmarshalFoo does not match the provided input")
		}
	})

	// Seed the initial corpus
	f.Add("cat", big.NewInt(1341))
	f.Add("!mouse", big.NewInt(0))
}

If the -fuzz flag is set, the additional fuzzing would occur after TestFuzzMarshalFoo returns but before its Cleanup callbacks are executed.

Copy link

Member

Author

katiehockman commented 20 days ago

edited

@JAicewizard

wouldn't it be amazing if there would be an option to transform bugs found with fuzzing into unit test cases automatically?

There were discussions about code generation in some of the first designs, but it was eventually decided that it was a complexity without much gain, especially since we can gain the same benefits in a simpler way. With this design, new crashes are put into testdata and will automatically run as a regression test on the next go test run (even when -fuzz isn't used).

And if someone would prefer to put this crash into the code instead of using the one in testdata, they have a couple options:

  1. They can use f.Add with the existing fuzz target (which works basically the same way as it would if it were in testdata, its just in the code instead)
  2. They can copy-paste the Fuzz target and make slight alterations to the code to make it a unit test. We intentionally designed fuzz targets to look very similar to unit tests, which allows fuzz targets to be easily translated into unit tests, but also allows unit tests to be easily translated into fuzz targets.

Copy link

Contributor

josharian commented 20 days ago

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions. It seems to me that crashes need to be reified in regular tests.

That said, a trained corpus (not just crashes) can make fuzzing much more effective at finding new issues. I understand not wanting to pollute a repo with a corpus, but I think we should design an intended way for people to keep and share corpora. I have used git submodules in the past reasonably hapiily, but that's git-specific (and submodules are generally frustrating to work with). I'm not sure what the answer is here, but I think at least trying for an answer is an important part of the fuzzing big picture.

A concentration of testing on fuzzing. It's great, but it's not the only way to find bugs, although some act as if it is.

I think part of the attraction is not just it finds bugs, but also how it integrates with developer workflow. The nice thing about fuzzing is that each bug has a witness input. That input, and the ability to debug with your own tools, makes fuzzing-as-a-process quite nice.

That being said, definitely no silver bullets here.

The massive computational resources it consumes. It's directed, but not that far off a million monkeys at a million keyboards.

This also relates to the proposal "This type of testing works best when able to run more mutations quickly, rather than fewer mutations intelligently.".

AFL/libfuzzer really focus on iteration speed, but there are other approaches like symbolic execution which are slow, but generate new coverage (more or less) by definition on each iteration. Less fast million monkeys; more like applied model checking using heavy-weight SMT/SAT solvers. Most research, and some products, pair the two together. I know for a fact Mayhem and Mechanical Phish did this in the cyber grand challenge, for instance.

The great thing about the proposal is the interface and build process is standardized. I view this proposal as relevant beyond fuzzing to the broader set of dynamic analysis tools.

@bcmills

Why would such an interface type need to be defined in the testing package itself?

Putting the interface directly into the testing package gives any codebase that uses the default Go fuzzing tooling significant flexibility when it comes to changing up the testing methodology.

In the world of C++ fuzzing, the fuzz interface has been de-facto standardized on the LLVMFuzzerTestOneInput function, originally created by libFuzzer. Because this is just a function that expects a value, every other C++ fuzzing tool works with the same interface, allowing users to take a codebase full of fuzz functions and use any of the available open-source tools with no code changes.

This ability has been put to great use by tools like OSS-fuzz/clusterfuzz, which allow users to write one fuzz test and have it attacked by 3 different fuzzing engines without any more developer effort.

We constantly see evidence (like this example) of new fuzzing techniques uncovering bugs in code that had previously been thoroughly fuzzed. Making testing.F a concrete type locks down any method that uses it to the standard Go fuzzing tooling, and trying out a different tool would require significant developer effort on any codebase with a healthy collection of fuzz tests.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

Keeping testing.F a concrete type allows for future methods to be added and documented

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

Copy link

Contributor

rsc commented 20 days ago

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

Copy link

Member

Author

katiehockman commented 20 days ago

@everestmz

I might be misunderstanding you here, but how would turning testing.F into an interface preclude adding more functions in the future?

I think what @bcmills is saying here is that if we define testing.F as an interface, the interface is basically locked as soon as it's no longer experimental. Otherwise, if we added a new function, then any types that don't update will no longer implement the interface, and will break backwards compatibility. If testing.F is a concrete type, that's not a concern.

I am almost positive that we will come up with more ways to extend the testing.F type in the future, e.g. for providing options while fuzzing (which is still a future consideration). Making it an interface greatly limits flexibility for growth.

It would feel like an oversight to hamstring Go fuzzing right out of the gate by limiting available testing methodologies to the default - the fuzzing community consistently develops more and more effective techniques and algorithms, and it would be a shame to not take advantage of that.

This is a fair point. What do you think about supporting different modes with go build to allow running with different fuzzing engines. For example, go build -libFuzzer which could instrument a go binary with what is needed for running with libFuzzer, and provide a binary that can be run directly with libFuzzer. I realize that is still somewhat limiting, as we would need to provide different modes for the instrumentations that are supported, but it's something that may be prove useful.

Copy link

Member

Author

katiehockman commented 20 days ago

@bcmills
re: func (*T) NewFuzzer(ff interface{}) *F

I can see your point that it has some type-state invariants. However, I'm not as concerned by that. A few things:

The initial designs looked like this: f.Fuzz(func(f *testing.F, b []byte) {...}). A big reason why we decided to change it to f.Fuzz(func(t *testing.T, b []byte) {...}) was in an attempt to make the scope of the fuzz function clearer. Because the fuzz function takes a *testing.T type, it should be clearer that it's basically a standalone unit test, and that you shouldn't be calling *testing.F functions within it (since you have a *testing.T)

I also think that if someone calls f.Add inside of f.Fuzz, then they are confused by what f.Add is doing. I believe that if we document these functions really well, that it should be apparent that seeding the corpus while fuzzing doesn't really make sense, but instead is something you do beforehand.

I also think go vet should be able to help with this.

The main reason I like f.Fuzz instead of f.NewFuzzer is that f.Fuzz is clearly an action. f.Fuzz says "Now start fuzzing this function that I've provided". I find it a bit more confusing that your proposition could call f.NewFuzzer, creating a testing.F type, then do nothing else on it. It just has to be assumed that it will start fuzzing at the end of the function, which isn't obvious to the reader. It also requires that you must define the function that is being fuzzed before you can seed the corpus, even though the actual code would first seed the corpus, then run the function. Basically requiring that the "pre-work" must be written after the actual fuzz function, which feels awkward.

Making Fuzz a method seems to invite user code to try to perform setup before the call to Fuzz and/or cleanup afterward. (Is that the intent?)

Yes and no. Pre-work should be done before calling f.Fuzz (ie. before fuzzing). But if there is cleanup work that needs to be done, then they should call f.Cleanup before f.Fuzz, or t.Cleanup inside the fuzz function, depending on what they want to happen. For example, someone could do this:

func FuzzFoo(f *testing.F) {
	f.Add([]byte{0})
	f.Cleanup(func() { f.Log("done!") })
	f.Fuzz(func(t *testing.T, b []byte) {
		t.Cleanup(func() { t.Logf("inner") })
	})
}

Copy link

Contributor

zeebo commented 20 days ago

It doesn't seem necessary to me to have testing.F be an interface to support other implementations because the concrete type does not contain or expose any details about how the fuzzing is happening. For example, it seems to me like the testing package could grow some API that is expected to be called in testing.Main (or something) that can control exactly how all of the testing.F values will later behave.

In other words, it doesn't seem necessary to do the virtual dispatch on the Add and Fuzz methods directly as long as the concrete type internally is able to do the virtual dispatch instead.

Copy link

Member

bcmills commented 20 days ago

This feature will be considered experimental in Go 1.17, and the API will not be covered by the Go 1 compatibility promise yet.

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@FiloSottile
"We'll do our best to communicate to developers in the coming months about what fuzzing can (and can't) do for them"

Well, I recommend the on line great book: The Fuzzing Book (Tools and Techniques for Generating Software Tests)
https://www.fuzzingbook.org/

Copy link

Member

Author

katiehockman commented 19 days ago

edited

@bcmills

In that case, could we gate it behind a build tag or GOEXPERIMENT setting?
(See also the (lengthy!) discussion in #34409.)

@jayconrod and I discussed this, and ended up deciding that there wasn't much to gain from gating it. Writing a fuzz target is already opt-in, since a developer would need to choose to write a target into their own source code.
And since these targets are in _test.go files, even if a module you depend on decides to add a fuzz target to one of their packages, that wouldn't be included in the build or impact your testing, unless you did something like go test ./....

It's not out of the question if we feel that not gating the feature could cause problems, but we couldn't come up with a compelling need for it. LMK if we are missing something here.

Copy link

Contributor

jayconrod commented 19 days ago

About testing.F being an interface or not:

In our experimental implementation, testing.F doesn't have much logic on its own. It serves as an entry point into the internal/fuzz package, which contains most of the logic. We're trying to keep internal/fuzz only loosely coupled with testing. In the future, we may want to "promote" that package out of internal/, making its API available for uses outside of testing.

Copy link

Member

bcmills commented 19 days ago

edited

@katiehockman, ideally a Go user ought to be able to run go test all and get back a meaningful result about the correctness and consistency of the transitive dependencies of their module. (That is especially important when moving to a new, untested platform, or when upgrading one or more dependencies in order to check for accidental regressions or accidental reliance on undocumented behavior.)

If the fuzzing API is stable as of Go 1.17, then users who opt in to fuzzing can guard some of their _test.go files with //go:build go1.17, and not break go test all for their users when they eventually upgrade to (say) Go 1.18 or Go 1.19. However, if the fuzzing API changes between Go 1.17 and Go 1.18, then go test as of Go 1.18 may be completely unable to even build the test for the package.

So I would prefer that we put the unstable parts of the API behind an explicit build tag — perhaps go117fuzz or unstable — so that early adopters won't break go test all for their downstream consumers in a future release.

Copy link

Member

Author

katiehockman commented 15 days ago

@bcmills That's a good point about go test all. Let's plan to add a build flag then. go117fuzz sounds good, since it's more specific.

Copy link

Contributor

timothy-king commented 14 days ago

@katiehockman FWIW some fuzzing architectures run a small fixed set of inputs when in "unit test" mode: the byte arrays nil, {0}, ..., {7}. (This would need to be adjusted for different types.) For go test all, a unit test mode seems reasonable. This could be done for Fuzzers when users have not specified a list of inputs given by f.Add(). If any inputs have been added using f.Add(), that would probably be better for a unit test mode. Both would give a brief sanity check and predictable performance.

@jayconrod With regards to testing.F being an interface or having virtual dispatch elsewhere, folks designing fuzzers will probably want quite a bit of control over the logic of the outer fuzzing loop. They will want their own coverage metrics, mutation strategies, pool of inputs, etc. I would also guess there would be a desire to control the input/output format a bit. (For example, I imagine controlling I/O, flags, etc. could be helpful if trying to add Ossfuzz support.)

Copy link

everestmz commented 8 days ago

edited

Thank you @katiehockman - that explanation makes complete sense, and I think the method @jayconrod specified of being able to swap out the underlying implementation would have the same effect while keeping testing.F as a concrete type.

What do you think about supporting different modes with go build to allow running with different fuzzing engines. For example, go build -libFuzzer which could instrument a go binary with what is needed for running with libFuzzer, and provide a binary that can be run directly with libFuzzer

I like this idea, and I think that the Go compiler is already moving in the correct direction here with the -d=libfuzzer flag that was added in 1.14. One thing I would like to see is the ability to make that compiler instrumentation more generically pluggable - as of right now, it's difficult to provide the LLVM-style comparison/coverage callbacks that the Go compiler depends on from anything except a fully linked executable. I would like the option to do this dynamically at runtime, using shared objects. This also brings up similar issues I've had at providing the comparison/coverage callbacks at runtime from Go code, from what appears to be conflicts between runtimes.

If the work has been done to support those coverage callbacks, it would be great to give people the ability to provide them in a way that doesn't explicitly require libFuzzer's presence. A -libFuzzer flag is a good start, but I think a flag that (to some extent) replicates the behavior of clang's -fsanitize=trace-cmp would be even more useful.

Copy link

Member

Author

katiehockman commented 7 days ago

@timothy-king

run a small fixed set of inputs when in "unit test" mode: the byte arrays nil, {0}, ..., {7}. (This would need to be adjusted for different types.) For go test all, a unit test mode seems reasonable. This could be done for Fuzzers when users have not specified a list of inputs given by f.Add(). If any inputs have been added using f.Add()

There are two modes in this proposal. One with -fuzz, which generates inputs to test. The other is much like the "unit test mode" you're describing, ie. without -fuzz. If there is no seed corpus provided with either f.Add or in testdata, I think it's better not to generate any when we aren't fuzzing, and to simply print a warning (or perhaps even fail the test). The downside I see with running a set of general-purpose seed corpus when one isn't provided is that would behave much differently from normal unit tests. These inputs would just be hard coded into the fuzzing infrastructure somewhere, out of view from the developers. I think it's better for the "unit test mode" to just run the unit tests that are described in the code, which the developer is asking the go command to run explicitly. I don't think it should behave in a special way when we aren't fuzzing just because it's a fuzz target.

@everestmz

A -libFuzzer flag is a good start, but I think a flag that (to some extent) replicates the behavior of clang's -fsanitize=trace-cmp would be even more useful.

Thanks for that feedback! We should definitely keep this in mind for future iterations of the design (even if it won't happen in the first experimental release).

Copy link

Member

bcmills commented 7 days ago

I've been thinking a bit more about third-party fuzzer interfaces.

On the one hand, maybe that's actually a good justification for keeping the Fuzz call at the end explicit rather than implicit. An explicit method can appear in a third-party interface.

On the other hand, the fact that the Fuzz callback accepts an explicit *testing.T (rather than, say, expecting the user's callback to close over the *testing.F to report errors) probably makes the Fuzz method awkward to incorporate into a runner-agnostic interface.

Copy link

Contributor

rsc commented 6 days ago

There are still some details to work through, but the general idea here seems to be overwhelmingly positively received.
It probably makes sense to move toward accepting this proposal and then have followup proposals for details, same as we did for generics.

Copy link

Contributor

rsc commented 6 days ago

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Copy link

beoran commented 5 days ago

I agree that fuzzing is great but I am also agreeing with @robpike's concerns. So I would like to propose three small but significant change to address these concerns:

In stead of go test -fuzz make it go fuzz, and in stead of using the testing.F package and type, make it the fuzzing.F
package and type. And fuzz tests must be placed in in files named xxxx_fuzz.go.

Like this, we create a separation between testing and fuzzing that IMO is very desirable. It helps avoiding that fuzzing becomes incorporated with the normal, existing tests. And it also is more clear to the user whether fuzz tests are available or not.

Copy link

Member

FiloSottile commented 5 days ago

In stead of go test -fuzz make it go fuzz, and in stead of using the testing.F package and type, make it the fuzzing.F
package and type. And fuzz tests must be placed in in files named xxxx_fuzz.go.

The cost of doing this would be losing the extremely natural "go test -fuzzcrash foundgo test FAIL → write fixgo test PASS" cycle, which is one of the highlights of this UX.

The -fuzz flag, which is not and will never be set by default, keeps the expensive, non-deterministic, and "different" part of fuzz tests firmly opt-in already.

Benchmarks are also expensive and not a replacement for unit tests, but we don't have a go bench command or _bench.go files.

Copy link

Contributor

josharian commented 5 days ago

@FiloSottile I think a point I made earlier bears re-iterating:

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions.

So I think your cycle needs an additional step: crash found → write explicit test → go test FAIL.

Copy link

Member

FiloSottile commented 5 days ago

@FiloSottile I think a point I made earlier bears re-iterating:

Fuzz functions will change over time, as all code does. If/when they change, the crashes in testdata may no longer be effectively testing against regressions.

So I think your cycle needs an additional step: crash found → write explicit test → go test FAIL.

The cycle I am talking about would be within the same debugging session. The fuzzer finds a crasher, you sprinkle some printf, run go test, write a fix, run go test. There is no interval of time for the code to change in there, and being able to reproduce the crasher with just go test instead of having to write a unit test before you understand the issue is critical.

About checked in testdata, I am trying to think through what makes fuzzer inputs more likely to rot silently than, say, table driven tests. The testdata entry will have type information, it won't be just a binary blob that can silently get out of sync with the parser, and the plan is to make the test fail if types don't match. However, fuzz targets are written to ignore and skip nonsensical inputs, while unit tests are not, so maybe we should make Skip() fail the test if it happens with a testdata input?

Copy link

Contributor

josharian commented 5 days ago

I am trying to think through what makes fuzzer inputs more likely to rot silently than, say, table driven tests.

Visibility. If you're editing a table-driven test, the data is right there, easy to see and easy to edit.

Copy link

Member

bcmills commented 5 days ago

But doesn't that same argument apply to benchmarks? If you edit the benchmark itself, then it may no longer be valid to compare against a benchmark from a previous commit, and that benchmarking data is not explicit in the code either.

Copy link

Contributor

josharian commented 5 days ago

Sort of? Except people typically run before and after benchmarks side by side for other reasons (same machine, same toolchain, same conditions).

And if you change a benchmark, you're explicitly saying "I disavow all previous benchmark results". If you change a table-driven test, you modify the table as needed. If you change a fuzz function you...what? You don't want to give up on the existing tests (a la benchmarks). It's not clear how to modify the previous inputs (a la table-driven tests). And without being able to see and play with the existing data, I don't even know how you'd form an accurate mental model of what changes to the fuzz function are safe to make without losing test coverage.

Copy link

Contributor

jayconrod commented 5 days ago

Our existing model is that if you change the type signature of a fuzz function, errors will be reported for seed corpus inputs that don't have matching inputs. go test would report that (without the need for -fuzz). That prevents accidental loss of coverage: if you want to drop those inputs, you can delete the corresponding files in testdata.

We plan to ignore cached inputs with the wrong type though. These are valuable, but it's not clear to me they should cause errors after changing a type. Maybe they should, and you have to run go clean -fuzzcache to proceed (or something more precise)?

About table driven tests, at some point, we were talking about building a tool that could convert a file in the testdata seed corpus into a call to F.Add. I think my preference is to keep those inputs in testdata though; they're not easy to read or edit.

Copy link

thepudds commented 5 days ago

edited

The testdata entry will have type information, it won't be just a binary blob that can silently get out of sync with the parser, and the plan is to make the test fail if types don't match.

One approach would be to semi-gracefully handle, for example, adding or removing arguments to a function under test by mapping the on-disk corpus item to whatever types do match (in order, skipping over mismatch, using zero values / zero length arguments if needed), or some other simple matching logic that accepts partial matches. In some common cases cases, this would allow the value of on-disk example to only partially decay in the face of a signature change.

(Obviously, not needed for a first cut experimental MVP).

Copy link

Contributor

timothy-king commented 5 days ago

@jayconrod In the libFuzzer interface, the assumption is any []byte input is valid (so the precondition is effectively just true). Normally a well behaved fuzzer in that world parses and then stops processing malformed inputs as non-crashing. This is basically the distinction of:

func fuzz(b []byte) {
  x, y, z, err := parse(b)
  if err != nil { return } // stop but don't crash
  doSomething(x, y, z)
}
func fuzz(b []byte) {
  x, y, z, err := parse(b)
  if err != nil { panic("crash here") }
  doSomething(x, y, z)
}

My understanding is that you are proposing the latter. OssFuzz and similar fuzzing infrastructure are going to keep both the current coverage corpus and previous crashes around and transfer these between compilations. The infrastructure will want to reuse these. You will not want the coverage corpus to become crashes. You also do not want to discard the corpus in this situation. (It may have a few CPU decades behind it.)

Copy link

Member

bcmills commented 5 days ago

edited

@thepudds, @josharian: I feel like that use-case might be better served by “just writing another fuzz test”, and leaving the existing fuzz test well enough alone.

That does bring up an interesting use-case, though: if I've replaced a previous fuzz test with a new fuzz test, then I probably want to continue testing the old one against existing inputs to check for regressions, but I want to run only the new one during go test -fuzz going forward. Is there a straightforward way to do that with this API?

Copy link

beoran commented 3 days ago

@FiloSottile , actually, go bench, with _bench.go files seems like a great idea to me. I can never remember the combination of flags for I need for benchmarking.

And go test -fuzz is not as easy to remember nor explain as simply go fuzz is. As go fuzzall sorts of sub-command or options that are relevant for fuzzing only can be added more easily as well.

Copy link

Member

bcmills commented 21 hours ago

edited

Thinking about go fuzz vs. go test -fuzz, the analogy to go test -bench does not quite hold.

If I run go test -bench=., then the go command runs all of the tests, plus the named benchmarks, and then emits the output for all of those tests and benchmarks together.

In contrast, AFAICT go test -fuzz=. by default will not return at all: it will not have an exit code, and it's not clear to me whether it will produce any output on success. That makes it different enough from go test that I think I could justify making it a separate command, even though fuzzing is deeply related to testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Likely Accept
Milestone

Proposal

Linked pull requests

Successfully merging a pull request may close this issue.

None yet

16 participants
and others

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK