How to work with Postgres in Go

When an application which uses a database exhibits some kind of unexpected behavior, that sparks a holy war between DBAs and developers: DBAs scream: “Your application crashes the database!”, while developers shout back: “But everything worked just fine before that!”. Worst of all, the DBAs and developers can’t really help each other: the former don’t know the nuances of the application and features of the driver, the latter don’t know all the dark corners of the infrastructure. It would be nice to avoid this kind of a mess.

As you might have guessed: quite often, merely scrolling through go-database-sql.org is not enough. It’s better to arm yourself with other people’s experience. Even better if the experience is obtained through pain and lost money.

Tools

You can find the essentials on working with pretty much any SQL database in Go at go-database-sql.org. If you haven’t read that yet — please do.

From my point of view, the main strength of Go is its simplicity. For instance, that simplicity manifests in the common practice of writing queries in raw SQL (ORM is not welcome). It turns out to be both a boon and a source of extra hardships.

So when you take database/sql package from the standard library, you want to extend its interfaces. As soon as that happens, take a look at github.com/jmoiron/sqlx. I’ll show you a few examples of how this extension can make your life easier.

StructScan usage allows you to avoid manual column <-> field mapping.

NamedQuery usage lets you use structure fields as placeholders in a query.

Using Get and Select enables you to eliminate stupid database row fetching loops.

Drivers

database/sql is a set of database access interfaces, and sqlx is their extension. The interfaces need an implementation to work. Drivers are responsible for the implementation.

Most popular drivers:

github.com/lib/pq — pure Go Postgres driver for database/sql. For a long time, this driver was the standard by default. Currently, it has lost its relevance and is not developed by its author.
github.com/jackc/pgx — PostgreSQL driver and toolkit for Go. Today this tool is a better choice.

github.com/jackc/pgx — you really want to use this driver. Why? It is actively developed and supported.

It can be more performant if used without database/sql interfaces.
It supports more than 60 of Postgres-specific types (extra ones that Postgres has in addition to standard SQL ones).
It provides an option to log whatever happens within the driver.
pgx has human-readable errors, while lib/pq throws panics. If you don’t catch a panic, the program will crash. (As a side note, don’t use panics in Go as you would use exceptions in other languages; they are quite different.)
With pgx we have an option to configure every connection independently.
It supports the PostgreSQL logical replication protocol.

Typically one would write the following loop to fetch data from the database:

Internally, the driver receives data and accumulates it in a 4 KB buffer. Network roundtrip and buffer filling occurs on rows.Next() call. If the 4KB buffer is not enough to complete fetching, the next batch of data is fetched from the network. The more network trips are there, the slower the processing speed becomes. On the other hand, as the buffer limit is 4 KB, we won’t hog the entire available memory.

But, of course, we would like to maximize the buffer capacity to minimize the number of network calls, and lower the latency of our service. So let’s add such an option in the driver and try to gauge the expected speed boost with synthetic tests:

Evidently there is no significant processing speed difference. But why?

As it turns out, we are limited by the size of the data sending buffer within Postgres itself. That buffer has a hardcoded size of 8KB. Using strace we can see that the OS returns 8192 bytes in the read system call. tcpdump confirms this with the packet sizes.

Tom Lane (one of the main developers of the Postgres core) comments on that as follows:

Traditionally, at least, that was the size of pipe buffers in Unix machines, so in principle this is the most optimal chunk size for sending data across a Unix socket.

Andres Freund (Postgres developer from EnterpriseDB) thinks that the 8KB buffer is not the best implementation currently and there should be performance tests with other sizes and other socket configurations.

Apart from that, we should remember that PgBouncer also has a buffer and its size can be configured with the pkt_buf parameter.

Another dangerous feature of the pgx (v3) driver: for every established connection it sends requests into the database to get the information about the Object ID (OID).

These identificators were added to Postgres to uniquely identify internal objects: strings, tables, functions, etc.

The driver uses the knowledge about OIDs to figure out how to map data from database column types into primitive Go types. For this purpose, pgx internally uses the following map (key — type name, value — Object ID)

This implementation causes the driver to send 3 queries for every connection to the database to fill the table with Object IDs.

If the database and the application work normally, the Go connection pool makes it possible to avoid spawning new connections to the database. However, in the event of tiniest database degradation, the connection pool gets exhausted and the connections rate increases exponentially. OIDs fetching requests are pretty heavy, and as a result, the driver can bring the database into a critical state.

Here is a moment when such OIDs requests were poured onto one of our databases:

How to work with Postgres in Go - AvitoTech - Medium

How to work with Postgres in Go

Tools

Drivers

Recommend

从玉米食材切入健康市场，「纯粒」做了一瓶无添加的NFC鲜榨玉米汁

88元一杯，估值60亿，现在开奶茶店还来得及吗？

BAT的AI局：三国杀还是斗地主？

拥有25年历史，每月房租高达7万美元，Supreme纽约首店要与粉丝说再见了

互联网大佬为何频频翻车成“老赖”？

最前线 | 自营太难，小鹏汽车联手特来电共建充电体系

华尔街质问马云“股东第三”价值观，但坚持“股东第一”的朗讯北电坟头草三尺高，美国人终...

发布人工智能设计平台 Dramatic Reality,「艾佳生活」要用AI提高室内设计效率

最前线丨百度Apollo开放无人驾驶车队试乘，但还是离不开安全员

腾讯为何屡败屡战短视频红海

About Joyk