39

Lessons learned porting 50k loc from Java to Go

 5 years ago
source link: https://www.tuicool.com/articles/hit/Ivaq2im
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

I was contracted to port a large Java code base to Go.

The code in question is a Java client for RavenDB , a NoSQL JSON document database. Code with tests was around 50 thousand lines.

The result of the port is a Go client .

This article describes what I've learn in the process.

Testing, code coverage

Large projects benefit greatly from automated testing and tracking code coverage.

I used TravisCI and AppVeyor for testing. Codecov.io for code coverage. There are many other services.

I used both AppVeyor and TravisCI because a year ago Travis didn't have Windows support and AppVeyor didn't have Linux support.

Today if I was settings this up from scratch, I would stick with just AppVeyor, as it can now do both Linux and Windows testing and the future of TravisCI is murky, after it was acquired by private equity firm and reportedly fired the original dev team.

Codecov is barely adequate. The biggest problem is that for Go, they count empty lines as not executed so it's impossible to get 100% code coverage as reported by the tool. Coveralls seems to have the same problem.

It's better than nothing but there's an opportunity to do things better, especially for Go programs.

Go's race detector is great

Parts of the code use concurrency and it's really easy to get concurrency wrong.

Go provides race detector that can be enabled with -race flag during compilation.

It slows down the program but additional checks can detect if you're concurrently modifying the same memory location.

I always run tests with -race enabled and it alerted me to numerous races, which allowed me to fix them promptly.

Building custom tools for testing

In a project that big it's impossible to verify correctness by inspection. Too much code to hold in your head at once.

When a test fails, it can be a challenge to figure out why just from the information in the test failure.

Database client driver talks to RavenDB database server over HTTP using JSON to encode commands and results.

When porting Java tests to Go, it was very useful to be able to capture the HTTP traffic between Java client and server and compare it with HTTP traffic generated by Go port.

I built custom tools to help me do that.

For capturing HTTP traffic in Java client, I built a logging HTTP proxy in Go and directed Java client to use that HTTP proxy.

For Go client, I built a hook in the library that allows to intercept HTTP requests. I used it to log the traffic to a file.

I was then able to compare HTTP traffic generated by Java client to traffic generated by my Go port and spot the differences.

Porting process

You can't just start porting 50 thousand lines of code in random order. Without testing and validating after every little step I'm sure I would be defeated by complexity.

I was new to RavenDB and Java code base. My first step was to get a high-level understanding how Java code works.

At the core the client talks to the server via HTTP protocol. I captured the traffic, looked at it and wrote the simplest Go code to talk the server.

When that was working it gave me confidence I'll be able to replicate the functionality.

My first milestone was to port enough code to be able to port the simplest Java test.

I used a combination of bottom-up and top-down approach.

Bottom-up part is where I identified the code at the bottom of call chain responsible for sending commands to the server and parsing responses and ported those.

The top-down part is where I stepped through the test I was porting to identify which parts of the code need to be ported to implement that part.

After successfully porting the first step, the rest of the work was porting one test at a time, also porting all the necessary code needed to make the test work.

After the tests were ported and passing, I did improvements to make the code more Go-ish.

I believe that this step-by-step approach was crucial to completing the work.

Psychologically, when faced with a year-long project, it's important to have smaller, intermediate milestones. Hitting those kept me motivated.

Keeping the code compiling, running and passing tests at all times is also good. Allowing bugs to accumulate can make it very hard to fix them when you finally get to it.

Challenges of porting Java to Go

The objective of the port was to keep it as close as possible to Java code base, as it needs to be kept in sync in the future with Java changes.

I'm somewhat surprised how much code I ported in a line-by-line fashion. The most time consuming part of the port was reversing the order of variable declaration, from Java's type name to Go's name type . I wish there was a tool that would do that part for me.

String vs. string

In Java, String is an object that really is an reference (a pointer). As a result, a string can be null .

In Go string is a value type. It can't be null , only empty.

It wasn't a big deal and most of the time I could mechanically replace null with "" .

Errors vs. exceptions

Java uses exceptions to communicate errors.

Go returns values of error interface.

Porting wasn't difficult but it did require changing lots of function signatures to return error values and propagate them along the call chain.

Generics

Go doesn't have them (yet).

Porting generic APIs was the biggest challenge.

Here's an example of a generic method in Java:

public <T> T load(Class<T> clazz, String id) {

And the caller:

Foo foo = load(Foo.class, "id")

In Go, I used two strategies.

One is to use interface{} , which combines value and its type, similar to object in Java. This is not preferred approach. While it works, operating on interface{} is clumsy for the user of the library.

In some cases I was able to use reflection and the above code was ported as:

func Load(result interface{}, id string) error

I could use reflection to query type of result and create values of that type from JSON document.

And the caller side:

var result *Foo
err := Load(&result, "id")

Function overloading

Go doesn't have it (and most likely will never have it).

I can't say I found a good solution to port those.

In some cases overloading was used to create shorter helpers:

void foo(int a, String b) {}
void foo(int a) { foo(a, null); }

Sometimes I would just drop the shorter helper.

Sometimes I would write 2 functions:

func foo(a int) {}
func fooWithB(a int, b string) {}

When number of potential arguments was large I would sometimes do:

type FooArgs {
	A int
	B string
}
func foo(args *FooArgs) { }

Inheritance

Go is not especially object-oriented and doesn't have inheritance.

Simple cases of inheritance can be ported with embedding.

class B : A { }

Can sometimes be ported as:

type A struct { }
type B struct {
	A
}

We've embedded A inside B , so it'll inherit all the methods.

It doesn't work for virtual functions.

There is no good way to directly port code that uses virtual functions.

One option to emulate it is to use embedding and structs with function pointers. This essentially re-implements virtual table that Java gives you for free as part of object implementation.

Another option is to write a stand-alone function that dispatches the right function for a given type based on the type.

Interfaces

Both Java and Go have interfaces but they are different things, like apples and salami.

A few times I did create a Go interface type that replicated Java interface.

In more cases I dropped interfaces and instead exposed concrete structs in the API.

Private, public, protected

Go's designers are under-appreciated. Their ability to simplify concepts is unmatched and access control is one example of that.

Other languages gravitate to fine-grained access control: public, private, protected specified with the smallest possible granularity (per class field and method).

As a result a library implementing some functionality has the same access to other classes in the same library as external code using that library.

Go simplified that by only having public vs. private and scoping access to package level.

That makes more sense. When I write a library to, say, parse markdown, I don't want expose internals of the implementation to users of the library. But hiding those internals from myself is counter-productive.

Java programmers noticed that issue and sometimes they use an interface as a hack to fix over-exposed classes. By returning an interface instead of a a concrete class, they hide some of the public APIs available to direct users of the class behind an interface.

Concurrency

Go's concurrency is simply the best and a built-in race detector is of great help in repelling concurrency bugs.

That being said, in my first porting pass I went with emulating Java APIs. For example, I implemented a facsimile of Java's CompletableFuture class.

Only after the code was working I would re-structure it to be more idiomatic Go.

Fluent function chaining

RavenDB has very sophisticated querying capabilities. Java client uses method chaining for building queries:

List<ReduceResult> results = session.query(User.class)
                        .groupBy("name")
                        .selectKey()
                        .selectCount()
                        .orderByDescending("count")
                        .ofType(ReduceResult.class)
                        .toList();

This only works in languages that communicate errors via exceptions. When a function additionally returns an error, it's no longer possible to chain it like that.

To replicate chaining in Go I used an "stateful error" approach:

type Query struct {
	err error
}

func (q *Query) WhereEquals(field string, val interface{}) *Query {
	if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) GroupBy(field string) *Query {
if q.err != nil {
		return q
	}
	// logic that might set q.err
	return q
}

func (q *Query) Execute(result inteface{}) error {
	if q.err != nil {
		return q.err
	}
	// do logic
}

This can be chained:

var result *Foo
err := NewQuery().WhereEquals("Name", "Frank").GroupBy("City").Execute(&result)

JSON marshaling

Java doesn't have a built-in marshaling and the client uses Jackson JSON library.

Go has JSON support in standard library but it doesn't provide as many hooks for tweaking marshaling process.

That being said I didn't try to match all of Java's functionality as what is provided by Go's built-in JSON support seems to be flexible enough.

Go code is shorter

This is not so much a property of Java but the culture which dictates what is considered an idiomatic code.

In Java setter and getter methods are common. As a result, Java code:

class Foo {
	private int bar;

	public void setBar(int bar) {
		this.bar = bar;
	}

	public int betBar() {
		return this.bar;
	}
}

ends up in Go as:

type Foo struct {
	Bar int
}

3 lines vs. 11 lines. It does add up when you have a lot of classes with lots of members.

Most other code ends up being of equivalent length.

Notion for organizing the work

I'm a heavy user of Notion.so . Simplifying a lot, Notion is a hierarchical note taking application. Think a cross of Evernote and a wiki, exquisitely designed and implemented by top notch software designers.

Here's how I used Notion to organize my work on Go port:

raQbimb.png!web

Here's what's there:

  • not shown above, I have a page that is a calendar view where I take short notes about what I work on on a given day and how much time I spent. This is important information since it was a hourly contract. Thanks to those notes I know that I spent 601 hours over 11 months
  • clients like to know the progress. I had a page for each moth were I summarized the work done like this: ZfiERbm.png!web

    Those pages were shared with the client.

  • A short-term todo list helps when starting work each day: 32yE7nE.png!web
  • I even managed invoices as Notion pages and used "Export to PDF" function to generate PDF version of the invoice

Go programmer for hire

Does your company need an extra Go programming help? You canhire me.

Additional resources

Other material:

  • if you need a NoSQL, JSON document database, give RavenDB a try. It's chock full of advanced features
  • if you're programming in Go, try a free Essential Go programming book
  • if you're interested in Notion, I'm world's most advanced user of Notion:
    • Ireverse engineered Notion API
    • I wrote an unofficial Go library for Notion API
    • all content on this website is written in Notion and published withmy custom toolchain

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK