33

The top 10 most common mistakes ive seen in Go projects

 4 years ago
source link: https://www.tuicool.com/articles/mEvQZbe
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The Top 10 Most Common Mistakes I’ve Seen in Go Projects

Jul 17 ·10min read

INby6ni.jpg!web

Credit: teachertoolkit.co.uk

This post is my top list of the most common mistakes I’ve seen in Go projects. The order does not matter.

Unknown Enum Value

Let’s take a look at a simple example:

Here, we created an enum using iota leading to the following state:

StatusOpen = 0
StatusClosed = 1
StatusUnknown = 2

Now, let’s imagine this Status type is part of a JSON request and will be marshalled/unmarshalled. We can design the following structure:

Then, receive requests like this:

Here, nothing really special, the status will be unmarshalled to StatusOpen , right?

Yet, let’s take another request where we omit the status value (for whatever reasons):

In this case, the Status field of the Request structure will be initialized to its zeroed value (for an uint32 type: 0). Therefore, StatusOpen instead of StatusUnknown .

The best practice then is to set the unknown value of an enum to 0 this way:

Here, if the status is not part of the JSON request, it will be initialized to StatusUnknown as we would expect.

Benchmarking

Benchmarking correctly is hard. There are so many factors that can impact a given result.

One of the common mistakes is to be fooled by some compiler optimizations. Let’s take a concrete example from the teivah/bitvector library:

This function clears the bits within a given range. To bench it, we may want to simply do this:

In this benchmark, the compiler will notice that clear is a leaf function (not calling any other function) so it will inline it. Once it is inlined, it will also notice that there are no side-effects . So the clear call will simply be removed leading to highly inaccurate results.

One option can be to set the result to a global variable like this:

Here, the compiler will not know whether the call has a side-effect. So, it will keep the clear call leading to accurate results.

Further reading

Pointers! Pointers Everywhere!

Passing a variable by value will create a copy of this variable. Whereas passing it by pointer will just copy the memory address.

Hence, passing a pointer will always be faster, isn’t it?

If you believe this, please take a look at this example . This is a benchmark on a 0.3 KB data structure that we pass and receive by pointer and then by value. 0.3 KB is not huge but that should not be far from the type of data structures we see every day (for most of us).

When I execute these benchmarks on my local environment, passing by value is more than 4 times faster than passing by pointer. This might a bit counterintuitive, right?

The explanation of this result is related to how the memory is managed in Go. I couldn’t explain it as brilliantly as William Kennedy but let’s try to summarize it.

A variable can be allocated on the heap or the stack . As a rough draft:

  • The stack contains the ongoing variables for a given goroutine . Once a function returned, the variables are popped from the stack.
  • The heap contains the shared variables (global variables, etc.).

Let’s check at a simple example where we return a value:

Here, a result variable is created by the current goroutine. This variable is pushed into the current stack. Once the function returns, the client will receive a copy of this variable. The variable itself is popped from the stack. It still exists in memory until it is erased by another variable but it cannot be accessed anymore .

Now, the same example with a pointer:

The result variable is still created by the current goroutine but now the client receives a copy of the variable address. If the result variable was popped from the stack, the client of this function could not access it anymore.

In this scenario, the Go compiler will escape the result variable to a place where the variables can be shared: the heap .

Passing pointers, though, is another scenario. For example:

Because we are calling f within the same goroutine, the p variable does not need to be escaped. It is simply pushed to the stack and the sub-function can access it.

Why is the stack so fast then? There are two main reasons:

  • There is no need to have a garbage collector for the stack. As we said, a variable is simply pushed once it is created then popped from the stack once the function returns. There is no need to get a complex process reclaiming unused variables.
  • The stack belongs to one goroutine so storing a variable does not need to be synchronized compared to storing it on the heap. This also results in a performance gain.

Two points to add before concluding:

  • It is possible to know when the compiler will escape a variable to the heap by using the following command: go build -gcflags "-m -m"
  • Did you ever wonder why the Write method of io.Writer was getting a slice as an input instead of returning one? This is also counterintuitive at first. One of the reason is related to what we have just seen. If we had returned a slice (which is a pointer), it would have been escaped to the heap (leading in a performance penalty).

As a conclusion, when we create a function, our default behavior should be to use values instead of pointers . A pointer should only be used if we want to share a variable.

Then, if we suffer from performance issues, one possible optimization could be to check whether pointers would help or not in some specific situations. But again, for most of our day-to-day use cases, values are the best fit.

Further reading

Breaking a for/switch or a for/select

What happens in the following example if f returns true?

We are going to call the break statement. Yet, this will break the switch statement, not the for loop .

Same problem with:

The break is related to the select statement, not the for loop.

One possible solution to break a for/switch or a for/select is to use a labeled break like this:

Errors Management

Go is still a bit lightweight in the way to deal with errors. It’s not a coincidence if this is one of the most expected improvements of the v2.

The current standard library (before Go 1.13) only offers functions to construct errors so you probably want to take a look at pkg/errors (if this is not already done).

This library is a good way to respect the following rule of thumb:

An error should be handled only once . Logging an error is handling an error. So an error should be either logged or propagated.

With the current standard library, it is almost impossible to respect this as we may want to add some context to an error and have some form of hierarchy. As an example of what we would expect with a REST call leading to a DB issue:

unable to server HTTP POST request for customer 1234
 |_ unable to insert customer contract abcd
     |_ unable to commit transaction

If we use pkg/errors , we could do it this way:

The initial error (if not returned by an external library) could be created with errors.New . The middle layer, insert , wraps this error by adding more context to it. Then, the parent handles the error by logging it. Each level either return or handle the error.

We may also want to check at the error cause itself:

Here, we can imagine having a db package from an external library dealing with the database accesses. The error cause must be checked using errors.Cause .

One common mistake I’ve seen is using pkg/errors partially. Checking an error was for example done this way:

If an error is wrapped, we will never go to line 6 even though the root is a db.DBError .

Slice Initialization

Sometimes, we know what will be the final length of a slice. For example, let’s say we want to convert a slice of Foo to a slice of Bar . I often see slices initialized this way:

A slice is not a magic structure. Under the hood, it implements a growth strategy if there is no more space available. In this case, a new array is automatically created (with a bigger capacity ) and all the items are copied over.

Now, let’s imagine we need to repeat this operation multiple times as our []Foo contains thousand of elements? The amortized time complexity (the average) of an insert will remain O(1) but in practice, it will have a performance impact.

Therefore, if we know the final length, we can either:

  • Initialize it with a predefined length:
  • Or initialize it with a 0-length and predefined capacity:

What is the better option? There are no real changes. Yet, my personal preference goes to the second one because it makes things more consistent . Regardless of whether we know the initial size, adding an element at the end of a slice is always done using append .

Context Management

context.Context is quite often misunderstood by the developers. According to the official documentation:

A Context carries a deadline, a cancelation signal, and other values across API boundaries.

This description is generic enough to make some people puzzled about why and how it should be used.

Let’s try to describe it briefly. A context can carry:

  • A deadline . It means either a duration (e.g. 250 ms) or a date-time (e.g. 2019-01-08 01:00:00 ) by which we consider that if it is reached, we must cancel an ongoing activity (an I/O request, awaiting a channel input, etc.).
  • A cancelation signal (basically a <-chan struct{} ). Here the behavior is similar. Once we receive a signal, we must stop an ongoing activity. For example, let’s imagine we receive two sequential requests. One to insert some data and another one to cancel the first request (because it’s not relevant anymore for example). This could be achieved by using a cancelable context in the first call that would be then canceled once we get the second request.
  • A list of key/value (both based on an interface{} type).

Two things to add. First, a context is composable . So, we can have a context that carries a deadline and a list of key/value for example. Moreover, multiple goroutines can share the same context so a cancelation signal can potentially stop multiple activities.

Coming back to our topic, here is a concrete mistake I’ve seen.

A Go application was based on urfave/cli (if you don’t know it, that’s a nice library to create command-line applications in Go). Once started, the developer inherits from a sort of application context. It means when the application is stopped, the library will use this context to send a cancellation signal.

What I experienced is that this very context was directly passed while calling a gRPC endpoint for example. This is not what we want to do.

Instead, we want to indicate to the gRPC library: Please cancel the request either when the application is being stopped or after 100 ms for example.

To achieve this, we have to rely on the fact that contexts are composable. If parent is the name of the application context (created by urfave/cli ), then we can simply do this:

Contexts are not that complex to understand and they are one of the best features of the language in my opinion.

Further reading

Not Using the -race Option

A mistake I do see very often is testing a Go application without the -race option.

As described in this report , although Go was “designed to make concurrent programming easier and less error-prone ”, we still suffer a lot from concurrency problems.

Obviously, the Go race detector will not detect and help for every single concurrency problems. Nevertheless, it is valuable tooling and we should always enable it while testing our applications.

Further reading

Using a Filename as an Input

Another common mistake is to pass a filename to a function.

Let’s say we have to implement a function to count the number of empty lines in a file. The most natural implementation would be something like this:

filename is given as an input, so we open it and then we implement our logic, right?

Now, let’s say we want to implement unit tests on top of this function to test with a normal file, an empty file, a file with a different encoding type, etc. It could easily become very hard to manage.

Also, if we want to implement the same logic but for an HTTP body, for example, we will have to create another function for that.

Go comes with two great abstractions: io.Reader and io.Writer . Instead of passing a filename, we can simply pass an io.Reader that will abstract the data source.

Is it a file? An HTTP body? A byte buffer? It’s not important as we are still going to use reader.Read() .

In our case, we can buffer the input to only read line by line so we can even use bufio.Reader and reader.ReadLine() :

The responsibility of opening the file itself is delegated to the count client:

With the second implementation, the function can be called regardless of the actual data source. Meanwhile, and it will facilitate our unit tests where we can simply create a bufio.Reader from a string :

Goroutines and Loop Variables

The last common mistake I’ve seen was made using goroutines with loop variables.

What is the output of the following example?

1 2 3 in whatever order? Nope.

In this example, each goroutine shares the same variable instance so it will produce 3 3 3 (most likely).

There are two solutions to deal with this problem. The first one is to pass the value of the i variable to the closure (the inner function):

And the second one is to create another variable with the for-loop scope:

It might look a bit odd to call i := i but it’s perfectly valid. Being in a loop means being in another scope. So i := i creates another variable instance called i (we may want to call it with a different name for readability purpose).

Further reading

iIbYnma.png!web

Any other common mistakes you would like to mention? Feel free to share them to continue the discussion :wink:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK