3

This Rails cache is not your friend!

 1 year ago
source link: https://sourcediving.com/this-rails-cache-is-not-your-friend-512871c138aa
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This Rails cache is not your friend!

1*vf7QUJljX-pNccSN0ToSyQ.jpeg

I recently learned that some of the assumptions I had made about one of Rails’ features were completely wrong, and this would sometimes lead me to write sub-optimal code. This was a delightfully frustrating discovery to make, having working with Rails for so long.

I had always been aware of this feature, but hadn’t spent much time thinking or caring about how it works and the implications of its use. And so I was very impressed to learn that it is not as it seemed…

The ActiveRecord::QueryCache is actually quite expensive!

What is the ActiveRecord::QueryCache?

As the name suggests, the ActiveRecord::QueryCache (I'll just write ARQC from here on) caches database queries made to the database by ActiveRecord. Within a particular request (or job or Rake task), the results from each database query are cached as a string in memory. Subsequent, identical queries fetch their results from the memory-store, rather than the database.

This has the beneficial effect of reducing the number of database queries that are made during a request. Since database read is a major bottleneck for most web applications, the ARQC helps to reduce the burden on limited resources and boost response speed.

The ARQC is enabled by default in ActiveRecord.

A code example

Consider the following code, which provides helpers to a view that loads the curently authenticated user:

class ApplicationController < ActionController::Base
helper_method :current_user, :signed_in?

private

def current_user
User.find_by(id: session[:user_id])
end

def signed_in?
!!current_user
end
end<% if signed_in? %>
Welcome <%= current_user.name %>! <%= link_to("My profile", user_path(current_user)) %>.
<% end %>

This seemingly ordinary Rails code calls the current_user method three seperate times. In each case, a database query is called to load a user record from the database using the ID stored in the session cookie.

The Rails logs for this request might contain something like this:

User Load (0.5ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]
CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]
CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ? [["id", 1], ["LIMIT", 1]]

Notice the “CACHE” prefix at the start of the second and third lines. These indicate to us that these two queries were fetched by the ARQC and did not query the database directly.

A slightly more complicated code example

The above example is very simplistic. But consider this slightly more complicated case. What if the authenticated user were to click the “My profile” link and view the own profile page?

The users controller to load a profile might look something like this:

class UsersController < ApplicationController
def show
@user = User.find(params[:id])
end
end

Here we are using a different method (find vs find_by) and a different source of data (params[:id] vs session[:id]) for the user ID, but if a user views their own profile then the resulting database query might be the same in both cases:

SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1;

It may not always be clear from reading the code alone that duplicate queries are being executed, but these should always be reported in the development logs.

Issues with ActiveRecord::QueryCache

Typically, when we hear the word “cache”, we think of some means of storage that is optimised for performance. I believe this leads a lot of engineers (including me) to assume that the ARQC is fine to use often. This assumption is reinforced by the Rails log output, which always reports a load time of 0.0ms when returning records via the ARQC (see the log output example above).

However, this reported load time of 0.0ms may be misleading. While it is indeed correct to report that 0.0ms of time was spent loading these results from the database, it is not also the case that 0.0ms was spent to load the records from the query cache.

I’ve demonstrated this in a demo application which you can view here: Query Cache Demo.

In this application, two versions of the same page are provided. One fetches records from the ARQC and one memoizes records — storing them in memory within the code.

The benchmarks demonstrate a striking difference between these two implementations!

Running on my own MacBook, the difference between cached and memoized was 19.7s and 7.8s respectively. In other words: the endpoint that relied on the ARQC took more than 2.5x longer than the optimised endpoint to serve the same number of requests.

I encourage you to run these benchmarks for yourself, and explore the code to better understand the differences.

Why so slow?

While the ARQC offers an improvement over making multiple calls to the database, it still adds overheads to the cost of serving a response. This is because the cache only stores the SQL string that is returned from the database software. Each time a cached query is called again, Rails still has to go through the effort of converting that SQL string into ActiveRecord objects. This process includes instantiating a new instance or collection of instances, casting and setting their attributes, and also running the after_initialize and after_find callbacks.

When we compare the number of objects allocated in each experiment branch of the demo application, we can see a stark difference in the number of allocations. The following excerpt is from the development logs for each endpoint:

Started GET "/memoized/users/1/recipes" for ::1 at 2022-08-16 20:15:55 +0100
...
Rendered collection of recipes/_recipe.html.erb [15 times] (Duration: 12.2ms | Allocations: 833)
Rendered recipes/index.html.erb within layouts/application (Duration: 23.1ms | Allocations: 2045)
...
Rendered layout layouts/application.html.erb (Duration: 25.5ms | Allocations: 3271)
Completed 200 OK in 34ms (Views: 29.9ms | ActiveRecord: 0.8ms | Allocations: 4183)


Started GET "/cached/users/1/recipes" for ::1 at 2022-08-16 20:16:05 +0100
...
Rendered collection of recipes/_recipe.html.erb [15 times] (Duration: 5.9ms | Allocations: 3539)
Rendered recipes/index.html.erb within layouts/application (Duration: 7.6ms | Allocations: 4470)
...
Rendered layout layouts/application.html.erb (Duration: 9.3ms | Allocations: 5680)
Completed 200 OK in 11ms (Views: 9.5ms | ActiveRecord: 0.6ms | Allocations: 6347)

Note: Ignore the render times in these examples. The can vary widely in a local development environment.

These numbers demonstrate the hidden costs of ARQC, and are something we should be mindful of when optimising our code.

What’s the alternative?

There are likely to be a few ways you can re-write your code to avoid the additional delays caused by re-initializing records multiple times per request. The simplest and most obvious one deserves mentioning though: memoization. By memoizing methods that trigger SQL queries, we can store the resulting record in-memory, rather than having to fetch a SQL results string from the query cache. The following code snippet refactors the above code, removing the reliance on the ARQC:

class ApplicationController < ActionController::Base
helper_method :current_user, :signed_in?

private

def current_user
@_current_user ||= User.find_by(id: session[:user_id])
end

def signed_in?
!!current_user
end
end<% if signed_in? %>
Welcome <%= current_user.name %>! <%= link_to("My profile", user_path(current_user)) %>.
<% end %>

Note that it was not neccesary to memoize signed_in?. Only current_user actually calls the database.

The Rails logs for this request should now look something like:

User Load (0.5ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ? LIMIT ?  [["id", 1], ["LIMIT", 1]]

Now there is only one query described. The two CACHED queries in the previous example are no longer present.

Conclusion

The ARQC is a useful addition to the Rails suite. It can make hacking and prototyping faster, without us having to worry about making many redundant queries to the database.

However, as the described benchmarks demonstrate, this cache is not free, and adds a degree of technical debt to an application (much like n+1 queries do).

Developers who are seeking to optimise a particular part of their code should be mindful of the overheads that ActiveReccord::QueryCache adds, and look for alternative ways of expressing their solution that do not rely on it.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK