In which we make two fuzz tests, and then do some more.
Fuzz Testing #
I’ve long been a fan of fuzz testing, at least in a network context. A fuzzing engine may not write better tests than I can, but it can churn through variations much more quickly. So let’s try out the fuzz testing built into Go (at least as of version 1.18).
One Way #
We’ll try fuzzing the map header -> text string -> map header round-trip first. If we get the same header back for any reasonable input, we’ll be confident about our conversions. In order to let the fuzzing engine just do variations on a string, we’ll do a cruder conversion of our struct:
|
|
Of the strings the fuzzing engine will feed us, only some will yield reasonable values via that
Sscanf()
call, and even fewer are suitable for the xml
library. So we’ll do a lot of skipping
in our fuzz target:
|
|
The chances of the engine giving us reasonable values is greatly increased by giving it a good seed corpus to mutate, so that’s the last thing to do:
|
|
Now the question is: do we get anything useful?
$ go test -fuzz=Fuzz_MapHeaderRoundTrip2 -parallel 5 -test.fuzzcachedir ./testdata/fuzz -v -fuzztime === RUN
...
Fuzz_MapHeaderRoundTrip2
fuzz: elapsed: 0s, gathering baseline coverage: 0/5 completed
fuzz: elapsed: 0s, gathering baseline coverage: 5/5 completed, now fuzzing with 5 workers
fuzz: elapsed: 3s, execs: 125869 (41921/sec), new interesting: 108 (total: 113)
fuzz: elapsed: 6s, execs: 125869 (0/sec), new interesting: 108 (total: 113)
...
fuzz: elapsed: 4m57s, execs: 21898244 (108216/sec), new interesting: 365 (total: 370)
fuzz: elapsed: 5m0s, execs: 22186800 (96181/sec), new interesting: 366 (total: 371)
fuzz: elapsed: 5m0s, execs: 22186800 (0/sec), new interesting: 366 (total: 371)
Tests skipped: 0
--- PASS: Fuzz_MapHeaderRoundTrip2 (300.08s)
=== NAME
PASS
ok glitch-aura-djinn 300.102s
Hmm, 22 million permutations in five minutes, 366 of which exercised different code paths in one way or another. The output says 0 skipped; it seems to be a quirk of the fuzzing process that my counting of skipped tests didn’t work as I expected. Now that the corpus has been expanded, what does a normal testing cycle show?
$ go test -v | grep "skip"
Tests skipped: 296
296 skipped, about 80% of even those that were deemed interesting. But at least it didn’t take long. I didn’t expect to find bugs in my simple code this way, yet.
The Other Way #
Now let’s try the test string -> map header -> test string round-trip test.
|
|
A little experimentation showed that I needed to do some work to verify that the string was in the format required, hence the regex above. Let’s skip if there are carriage returns or ampersands, too. What does that leave us with?
$ go test -fuzz=Fuzz_MapHeaderRoundTrip1 -parallel 5 -test.fuzzcachedir ./testdata/fuzz -v -fuzztime 5m
...
=== RUN Fuzz_MapHeaderRoundTrip1
fuzz: elapsed: 0s, gathering baseline coverage: 0/4 completed
fuzz: elapsed: 0s, gathering baseline coverage: 4/4 completed, now fuzzing with 5 workers
fuzz: elapsed: 3s, execs: 122494 (40747/sec), new interesting: 34 (total: 38)
fuzz: elapsed: 6s, execs: 122494 (0/sec), new interesting: 34 (total: 38)
fuzz: elapsed: 9s, execs: 122494 (0/sec), new interesting: 34 (total: 38)
...
fuzz: elapsed: 4m57s, execs: 553267 (0/sec), new interesting: 71 (total: 75)
fuzz: elapsed: 5m0s, execs: 553267 (0/sec), new interesting: 71 (total: 75)
fuzz: elapsed: 5m1s, execs: 553267 (0/sec), new interesting: 71 (total: 75)
Tests skipped: 0
--- PASS: Fuzz_MapHeaderRoundTrip1 (301.04s)
=== NAME
PASS
ok glitch-aura-djinn 301.093s
A lot slower than the first one, probably because of my regex. Let’s give it some more time.
...
fuzz: elapsed: 15m1s, execs: 46494098 (0/sec), new interesting: 61 (total: 136)
$ go test -v | grep "skip"
Tests skipped: 126
Even more skippage, which I guess is understandable.
What if I remove the skipped tests from the corpus? Does that help the fuzzing engine concentrate on more fruitful paths the next time? Time for a quick bash script.
|
|
$ ./fuzz_and_prune.sh Fuzz_MapHeaderRoundTrip1 5m > fuzz_and_prune.out
...
$ grep "now fuzzing with" fuzz_and_prune.out
fuzz: elapsed: 0s, gathering baseline coverage: 4/4 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 4/4 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 5/5 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 6/6 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 7/7 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 11/11 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 17/17 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 24/24 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 29/29 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 32/32 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 38/38 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 39/39 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 42/42 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 45/45 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 49/49 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 51/51 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 53/53 completed, now fuzzing with 12 workers
fuzz: elapsed: 0s, gathering baseline coverage: 55/55 completed, now fuzzing with 12 workers
Slow progress; not sure it’s any better, and might fence in the fuzzing too much to hit some interesting cases. Might be worth some experiments in future.
Oh, So That’s How It’s Done #
Here’s where I admit that I had been working off of a few examples and my memory and autocomplete, and had missed something about the fuzz function. Trying to understand the treatment of skipped tests, I took a look at the actual Go testing docs and watched GopherCon 2022: Katie Hockman - Fuzz Testing Made Easy.
From the docs: “ff must be a function with no return value whose first argument is *T and whose remaining arguments are the types to be fuzzed” (emphasis mine). So now I know I can specify multiple types, and not rely on collapsing everything into a single string.
|
|
Now we can do a round-trip header -> string -> header, with only an invalid ZName
or an invalid
ForestryLevel
to cause us to skip. Does this work much better?
$ go test -fuzz=Fuzz_MapHeaderRoundTrip2b -parallel 5 -test.fuzzcachedir ./testdata/fuzz -v -f
uzztime 15m
...
fuzz: elapsed: 14m57s, execs: 74661339 (91206/sec), new interesting: 163 (total: 168)
fuzz: elapsed: 15m0s, execs: 74936024 (91553/sec), new interesting: 163 (total: 168)
fuzz: elapsed: 15m0s, execs: 74936024 (0/sec), new interesting: 163 (total: 168)
Tests skipped: 0
--- PASS: Fuzz_MapHeaderRoundTrip2b (900.08s)
$ go test -v | grep "skip"
Tests skipped: 103
Seems a bit better. Certainly makes me feel there’s less time being wasted on string conversions.
And Now the Whole Map #
It’s relatively simple to add the rest of the functionality we need to read and recreate the ascii
map files. We add a splitMapLevels()
to split up the file into the individual levels, then
a parseMapLevel()
to parse a level into its header and cells, then a mapToString()
to form
those back into a map level string, and finally a readAndWriteMap()
to round-trip a map file
string through the whole process. This enables us to add the (cleaned-up) ascii map files to the
testdata
and write a really fun fuzz test:
|
|
The first few test cases look normal: simple variations on map file strings that are expected to come back the same after the round trip. Then we see all of the map filenames listed; while looping through the seed test cases, we read those files and swap in the contents for the strings. Instead of writing a bunch more test cases to try to cover everything, we can be pretty comfortable with the code that returns all of those map files correctly. And it should offer a much more useful corpus for the fuzzing engine to start with.
Of course, there are a bunch of reasons that a fuzzed string coming into the test case is invalid
and can be skipped; those have been pushed into a separate shouldSkip()
function. Every time the
fuzzing created a failure that was an invalid map file string, I added to that function, so it’s
quite large now.
Further fuzzing didn’t reveal any interesting bugs, but now we can feel comfortable adding more functionality. But first:
High-Level Tests in Bash #
With the help of a main()
function that simply reads a map file specified on the command line,
pushes it through readAndWriteMap()
(after replacing the line endings), and send it to stdout:
|
|
…we can write a quick bash script to churn through all of the map files and verify that they come out the same as they went in:
|
|
…and add that to the GitLab CI definition:
|
|
Whew. So now we have multiple levels of testing the lean on as we evolve this code. Tomorrow.
One More Thing #
While working on this project, I’ve started utilizing the Codeium LLM for autocomplete and doc/cleanup suggestions via their VSCode plugin. So far it’s been a bit more useful than some of the other LLMs I’ve tried; it comes up with a lot fewer, shall we say, fiction-based recommendations. Where I need a lot of boilerplate with minor variations, it does speed things up a bit; and when it’s time to refactor, it has enough context to be helpful there too.
Worth a try, I’d say. Although, as always, be careful with any closed-source code you feed to someone else’s services. And some people have concerns about the opaque language server that it runs locally.