40

How to work with Postgres in Go - AvitoTech - Medium

 4 years ago
source link: https://medium.com/avitotech/how-to-work-with-postgres-in-go-bad2dabd13e4?source=friends_link&%3Bsk=ff07cec4bdc4f0a3178a2b37c4b38ec0
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How to work with Postgres in Go

Image for post
Image for post

When an application which uses a database exhibits some kind of unexpected behavior, that sparks a holy war between DBAs and developers: DBAs scream: “Your application crashes the database!”, while developers shout back: “But everything worked just fine before that!”. Worst of all, the DBAs and developers can’t really help each other: the former don’t know the nuances of the application and features of the driver, the latter don’t know all the dark corners of the infrastructure. It would be nice to avoid this kind of a mess.

As you might have guessed: quite often, merely scrolling through go-database-sql.org is not enough. It’s better to arm yourself with other people’s experience. Even better if the experience is obtained through pain and lost money.

Tools

You can find the essentials on working with pretty much any SQL database in Go at go-database-sql.org. If you haven’t read that yet — please do.

From my point of view, the main strength of Go is its simplicity. For instance, that simplicity manifests in the common practice of writing queries in raw SQL (ORM is not welcome). It turns out to be both a boon and a source of extra hardships.

So when you take database/sql package from the standard library, you want to extend its interfaces. As soon as that happens, take a look at github.com/jmoiron/sqlx. I’ll show you a few examples of how this extension can make your life easier.

StructScan usage allows you to avoid manual column <-> field mapping.

Image for post
Image for post

NamedQuery usage lets you use structure fields as placeholders in a query.

Image for post
Image for post

Using Get and Select enables you to eliminate stupid database row fetching loops.

Image for post
Image for post

Drivers

database/sql is a set of database access interfaces, and sqlx is their extension. The interfaces need an implementation to work. Drivers are responsible for the implementation.

Most popular drivers:

  • github.com/lib/pq — pure Go Postgres driver for database/sql. For a long time, this driver was the standard by default. Currently, it has lost its relevance and is not developed by its author.
  • github.com/jackc/pgx — PostgreSQL driver and toolkit for Go. Today this tool is a better choice.

github.com/jackc/pgx — you really want to use this driver. Why? It is actively developed and supported.

  • It can be more performant if used without database/sql interfaces.
  • It supports more than 60 of Postgres-specific types (extra ones that Postgres has in addition to standard SQL ones).
  • It provides an option to log whatever happens within the driver.
  • pgx has human-readable errors, while lib/pq throws panics. If you don’t catch a panic, the program will crash. (As a side note, don’t use panics in Go as you would use exceptions in other languages; they are quite different.)
  • With pgx we have an option to configure every connection independently.
  • It supports the PostgreSQL logical replication protocol.

Typically one would write the following loop to fetch data from the database:

Image for post
Image for post

Internally, the driver receives data and accumulates it in a 4 KB buffer. Network roundtrip and buffer filling occurs on rows.Next() call. If the 4KB buffer is not enough to complete fetching, the next batch of data is fetched from the network. The more network trips are there, the slower the processing speed becomes. On the other hand, as the buffer limit is 4 KB, we won’t hog the entire available memory.

But, of course, we would like to maximize the buffer capacity to minimize the number of network calls, and lower the latency of our service. So let’s add such an option in the driver and try to gauge the expected speed boost with synthetic tests:

Image for post
Image for post

Evidently there is no significant processing speed difference. But why?

As it turns out, we are limited by the size of the data sending buffer within Postgres itself. That buffer has a hardcoded size of 8KB. Using strace we can see that the OS returns 8192 bytes in the read system call. tcpdump confirms this with the packet sizes.

Tom Lane (one of the main developers of the Postgres core) comments on that as follows:

Traditionally, at least, that was the size of pipe buffers in Unix machines, so in principle this is the most optimal chunk size for sending data across a Unix socket.

Andres Freund (Postgres developer from EnterpriseDB) thinks that the 8KB buffer is not the best implementation currently and there should be performance tests with other sizes and other socket configurations.

Apart from that, we should remember that PgBouncer also has a buffer and its size can be configured with the pkt_buf parameter.

Another dangerous feature of the pgx (v3) driver: for every established connection it sends requests into the database to get the information about the Object ID (OID).

These identificators were added to Postgres to uniquely identify internal objects: strings, tables, functions, etc.

The driver uses the knowledge about OIDs to figure out how to map data from database column types into primitive Go types. For this purpose, pgx internally uses the following map (key — type name, value — Object ID)

Image for post
Image for post

This implementation causes the driver to send 3 queries for every connection to the database to fill the table with Object IDs.

If the database and the application work normally, the Go connection pool makes it possible to avoid spawning new connections to the database. However, in the event of tiniest database degradation, the connection pool gets exhausted and the connections rate increases exponentially. OIDs fetching requests are pretty heavy, and as a result, the driver can bring the database into a critical state.

Here is a moment when such OIDs requests were poured onto one of our databases:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK