

A nasty bit of undefined timezone behavior in Golang
source link: https://www.dolthub.com/blog/2021-09-03-golang-time-bugs/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A nasty bit of undefined timezone behavior in Golang
Go is a great language. Really, it is! We complain about the rough edges, but on the whole it's been a great choice for us, and we're not sorry we picked it.
But just for fun, let's talk about some of those rough edges at length.
That was a pot shot. There are some actual foot-guns hidden in the runtime:
- Reads from a
nil
map are fine, writes panic - Reads from a closed channel block, sends on a closed channel panic
nil
sometimes does not== nil
- Any panic in any goroutine kills the entire process
- Can't safely store references to loop variables
Honestly, you get used to all these things. No language is without its quirks, and terrible design choices are no barrier to massive popularity. Just ask Rasmus Lerdorf, creator of PHP, by any measure one of the world's most successful languages.
But we found a new bug in golang, one that stumped an entire room of seasoned industry veterans for a solid hour. I actually hesitate to even call it a bug, because it's so obscure, and the trigger mechanism so specific to our unit test that it broke, that most people will probably never run into it. So, maybe surprising undefined behavior is a better description?
Discovering the bug
A new customer came to us with a problem: his R program writing rows to Dolt wasn't actually writing any rows. This was pretty susprising, since at this point we've integrated customers on a variety of platforms and languages, and they all manage to get basic read / write functionality working. That's the benefit of being MySQL compliant: if it works for one language or connector, that's pretty decent evidence it will work for others. But he sent us a repro case that demonstrated the problem. Something wasn't working.
statement <- enc2utf8(new("SQL", .Data = "INSERT INTO `mtcars`\n (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`)\nVALUES\n (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"))
params <- unname(as.list(mtcars))
rs <- new("MariaDBResult",
sql = statement,
ptr = RMariaDB:::result_create(conn@ptr, statement, is_statement = TRUE),
bigint = conn@bigint,
conn = conn)
# No rows affected both before and after binding data to the the result
rs
#> <MariaDBResult>
#> SQL INSERT INTO `mtcars`
#> (`mpg`, `cyl`, `disp`, `hp`, `drat`, `wt`, `qsec`, `vs`, `am`, `gear`, `carb`)
#> VALUES
#> (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
#> ROWS Fetched: 0 [incomplete]
#> Changed: NA
dbGetRowsAffected(rs)
#> [1] NA
if (!is.null(params)) {
dbBind(rs, params)
}
After a little digging into the intricacies of the R MySQL client, Max
figured out what the problem was: the result schema from a
COM_STMT_PREPARE
should return a response indicating 0 columns for
INSERT
, UPDATE
, and DELETE
statements, since these statements
don't have a return schema, just status metadata like rows edited. We
were returning a column with our internal OKResult
schema. Most
clients don't care about this difference, but R's does.
The fix was easy: just don't return a schema.
But that's not the bug. That's just the bug fix that made us discover the real bug, the one that had us scratching our heads. Because when we went to release it to unblock our customer, we also picked up another change, an innocuous change to how our SQL server's log messages were formatted.
Broken unit test
When we went to release the above fix, our CI runs started failing with a suspicious timezone error.
--- FAIL: TestSelect/select_*_from_branches_system_table (0.03s)
sqlselect_test.go:1558:
Error Trace: sqlselect_test.go:1558
sqlselect_test.go:1423
Error: Not equal:
expected: []sql.Row{sql.Row{"master", "so275enkvulb96mkckbun1kjo9seg7c9", "billy bob", "[email protected]", time.Time{wall:0x0, ext:62135596800, loc:(*time.Location)(0xc0008fa310)}, "Initialize data repository"}}
actual : []sql.Row{sql.Row{"master", "so275enkvulb96mkckbun1kjo9seg7c9", "billy bob", "[email protected]", time.Time{wall:0x0, ext:62135596800, loc:(*time.Location)(0x3245200)}, "Initialize data repository"}}
I knew that our newest hire had recently made some timezone related changes in the SQL server, so I demonstrated outstanding leadership and blamed him without any further evidence.
But this didn't hold up under scrutiny once we actually looked at the nature of the breakage. Something else was going on.
The test that was breaking was asserting deep equality with a
time.Time
struct constructed like this:
{
Name: "select * from branches system table",
Query: "select * from dolt_branches",
ExpectedRows: []sql.Row{
{
"master",
"so275enkvulb96mkckbun1kjo9seg7c9",
"billy bob", "[email protected]",
time.Date(1970, 1, 1, 0, 0, 0, 0, &time.Location{}),
"Initialize data repository",
},
},
Deep equality was failing because the time
struct returned by the
query had a different timezone than the expected value. Specifically,
we weren't giving an initialized time.Location
for the expected time
object, but the server was returning one.
Diff:
--- Expected
+++ Actual
@@ -10,9 +10,26 @@
loc: (*time.Location)({
- name: (string) "",
- zone: ([]time.zone) <nil>,
- tx: ([]time.zoneTrans) <nil>,
- extend: (string) "",
- cacheStart: (int64) 0,
- cacheEnd: (int64) 0,
- cacheZone: (*time.zone)(<nil>)
+ name: (string) (len=5) "Local",
+ zone: ([]time.zone) (len=1) {
+ (time.zone) {
+ name: (string) (len=3) "UTC",
+ offset: (int) 0,
+ isDST: (bool) false
+ }
+ },
+ tx: ([]time.zoneTrans) (len=1) {
+ (time.zoneTrans) {
+ when: (int64) -9223372036854775808,
+ index: (uint8) 0,
+ isstd: (bool) false,
+ isutc: (bool) false
+ }
+ },
+ extend: (string) (len=4) "UTC0",
+ cacheStart: (int64) 9223372036854775807,
+ cacheEnd: (int64) 9223372036854775807,
+ cacheZone: (*time.zone)({
+ name: (string) (len=3) "UTC",
+ offset: (int) 0,
+ isDST: (bool) false
+ })
After a lot of excited and confused shouting as we argued about how to ship our crucial bug fix when CI was failing for unrelated reasons, somebody figured out that the test started passing again when you commented out another test case, one that failed and wrote a log message. And it wrote a log message using a new format I had just merged, one that did this:
msg := fmt.Sprintf("%s %s [conn %d] %s {%s}\n", entry.Time.Format(time.RFC3339), lvl, connectionId, entry.Message, dataFormat.String())
But that's crazy. Surely formatting another, unrelated time.Time
struct in another part of the code base doesn't change the default
timezone value in unrelated code.
How is this possible?
As it turns out, our SQL server was using the time.Local
location,
which has some interesting behavior. golang doesn't try to determine
what your timezone info is without asking you. That would be rude!
Instead, it waits for you to tell it to load this info explicitly,
after which further calls to fetch the Local
location will include
this cached, detailed information. But calling Format
on a time
object, for some format strings (such as time.RFC3339
), needs to
load this information, so will do so implicitly as a side
effect. After such a call to time.Format
, all time.Time
structs
with the time.Local
location will include this info, when they
didn't before.
This is the kind of surprising, spooky action-at-a-distance nonsense that well-designed libraries tend to avoid. Or they at least try!
If Rob Pike happens to be reading this, know that I criticize because I care.
The fix was pretty easy: load the info ourselves, making explicit what was once implicit.
// We are doing structural equality tests on time.Time values in some of these
// tests. The SQL value layer works with times in time.Local location, but
// go standard library will return different values (which will have the same
// behavior) depending on whether detailed timezone information has been loaded
// for time.Local already. Here, we always load the detailed information so
// that our structural equality tests will be reliable.
var loadedLocalLocation *time.Location
func LoadedLocalLocation() *time.Location {
var err error
loadedLocalLocation, err = time.LoadLocation(time.Local.String())
if err != nil {
panic(err)
}
if loadedLocalLocation == nil {
panic("nil LoadedLocalLocation " + time.Local.String())
}
return loadedLocalLocation
}
...
time.Date(1970, 1, 1, 0, 0, 0, 0, time.UTC).In(LoadedLocalLocation())
Conclusion
All of us with a few years of experience under our belts have seen worse bugs than this. This one was kind of fun, because it only set us back a few hours, never hit production or impacted customers, and was just so wacky in its root cause that it crossed over into entertaining. Thanks for the memes, Rob. But still: even an entertaining foot-gun is still a foot-gun. Learn from our mistake, kids. Never trust a datetime library will do what you want, especially any defaults.
Like the article? Interested in Dolt? Come join us on Discord to say hi and let us know what you think!
Recommend
-
61
A lot has been written on C undefined behavior, some of it by myself and a lot more by people who know a lot more about compilers than I do. However, I now believe that a seemingly innocuous but far-reaching change to the...
-
56
In several places, the C and C++ language specifications use a curious, and fairly controversial, phrase: undefined behavior . For certain program constructs, the specification prescribes no specific behavior...
-
51
Undefined behavior contributes to many serious problems, including security vulnerabilities. It’s also, I believe, poorly understood,...
-
42
Also seePart 1 andPart 2. A C or C++ implementation must perform side effecting operations in program order. For example, if a program executes these lines of code: <strong>printf ("Hello\n")&...
-
24
I recently came across an interesting FFI situation, which I wanted to understand a bit better. Now I’d like to share what I found in the hopes of making everyone a better programmer. This post is about undefined b...
-
7
Exploring undefined behavior using constexpr We hear a lot about undefined behavior, most probably know we should avoid it. Maybe you have heard about specific types of undefined behavior, overflow, out of bounds memory access, stri...
-
13
Behavior considered undefined Rust code is incorrect if it exhibits any of the behaviors in the following list. This i...
-
10
Undefined Behavior and Unsafe Code Guidelines Last year, the Rust unsafe code guidelines strike team was founded, and I am on it. :-) So, fina...
-
12
How to Specify Program (Undefined) Behavior? This summer, I am given the awesome opportunity of spending three months in the Mozilla offices in Portland, working on Rust. I am extremely grateful that Mozilla is providing this opportu...
-
4
Undefined behavior in C is a reading error. – keeping simpleSkip to content Considering how important
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK