Generating safer Go code
source link: https://commaok.xyz/post/safer-generated-code/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Generating safer Go code
October 28, 2020
It's easy to forget to call go generate
when you need to. Failure to regenerate can mean nasty bugs.
Venerable gopher Rog Peppe found an excellent technique for guarding against this class of bugs. Like many good ideas, it is obvious in retrospect.
Generate code that will not compile if needs to be regenerated.
I'll illustrate this with two examples.
stringer
The first example comes directly from Rog, in the stringer
command. stringer
generates a String() string
method for integer types that have defined constants.
type T int
const (
One T = 1
Two T = 2
)
stringer
will generate a method that returns "One"
for 1
, "Two"
for 2
, and "T(3)"
for 3
.
What if you now change the value of One
to be 3
and forget to re-generate?
Well, stringer
also generated this function:
func _() {
var x [1]struct{}
_ = x[One-1]
_ = x[Two-2]
}
The function is named _
, which means it is impossible to call it. The compiler won't even bother generating code for it. It will, however, typecheck it. And typechecking is where the magic happens.
When the value of One
is 1
, x[One-1]
evaluates to x[0]
. Since x
has length 1
, that's OK.
When the value of One
is 3
, x[One-1]
evaluates to x[2]
. But x
only has length 1
! Attempts to compile this generate a compiler error: invalid array index One - 1 (out of bounds for 1-element array)
.
The function recorded the values of the constants when stringer was run and fails to compile if those values change.
cloner
Now that we know the trick, we can apply it elsewhere.
Tailscale has a little bespoke tool to generate Clone methods for structs.
The output of cloner
depends on the input struct fields. How can we trigger a compilation failure if we forget to re-run the tool after changing an input struct?
The trick is to duplicate the original struct in the generated code and then attempt to convert from the original struct to the current struct.
We start with this input code:
type T struct {
X int
}
After generating a Clone
method for T
, cloner
also generates:
var _ = T(struct {
X int
}{})
Here we've written out the exact form of T
when we generated the code, and assigned it to _
, which the compiler can discard. However, it still must be typechecked.
Suppose we now change the type T
. Let's add a new field.
type T struct {
X int
Y string
}
The conversion now fails: It's not possible to convert a struct { X int }
to a struct { X int; Y string }
.
Similar to stringer, cloner recorded the types when cloner was run and now fails to compile if those types change.
Compile-time assertion taxonomy
We've seen two forms of assertions that can trigger during typechecking: x == y and a struct's fields are unchanged.
There are others. For example, you can use conversions to assert that a type implements an interface. You can use conversion to uint to assert that one untyped constant is greater than or equal to than another. (You can't convert a negative constant to uint.)
There are some obscure ones, of questionable utility. For example, you could assert that two concrete types are distinct by putting them both as cases in a type switch, which disallows duplicate types.
I don't know of any attempt to exhaustively list compile-time assertions (aside from the spec) and how they can be used, with examples. Someone please make one!
Matthew Dempsky has proposed that Go add explicit compile time assertions for boolean expressions. (That doesn't cover relationships between types, although maybe generics would break some new ground here.) And I've written about a quirky way that you can write link-time assertions in Go.
Call to action
If you maintain a code generator, please check whether you can use this technique to protect your users from bugs. One obvious category is generated serialization/deserialization routines. There are almost certainly others.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK