2

Dynamic Cache Lifetime With Immutable Cache-Control Headers

 3 years ago
source link: https://medium.baqend.com/dynamic-cache-lifetime-with-immutable-cache-control-headers-db066cf60fea
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Dynamic Cache Lifetime With Immutable Cache-Control Headers

or: What the Heck We Are Doing With Age & Max-Age Headers

Our main business revolves around Speed Kit, a performance plug-in to accelerate websites: We have written and talked a lot about how Speed Kit works, but have never focused on the nitty-gritty details all that much lately. Because even though a lot of advanced caching logic happens behind the curtains on our end, deployment for our customers is pretty simple. In consequence, most questions we get are scoped to integration and configuration.

However, we recently got a request via Twitter that stood out:

This tweet was special for two reasons: First, it was technical way beyond the requests we typically receive. And second, the one who was asking is not just one some guy on the Internet, but co-author of several pivotal web standards and Senior Principal Engineer at Fastly! We tried to get the basic idea across in two tweets, but decided that Twitter is simply not the right venue to address the matter.

We wrote down this blog post to explain what exactly happens behind the curtains when Speed Kit caches your data — and what exactly we do with Cache-Control.

How Speed Kit’s caching works in concept

Speed Kit is a performance plug-in to accelerate the page load, running in your browser as a Service Worker (SW) process. Speed Kit essentially puts the HTML, assets, and other parts of the website into the SW cache to serve them from there instead of the slower network. Resources that are not present in the cache, are obviously still loaded from the network, though. And they always come with Cache-Control headers that tell the browser (Speed Kit) for how long it may keep a resource in its cache.

This time to live (TTL) is determined by two numbers: (1) the initial expiration date assigned by our backend and (2) the actual age, i.e. the time this resource has already been lying around in the content delivery network (CDN) before it was requested by the browser.

Image for post
Image for post
Figure 1: Baqend assigns a TTL, and encodes it in the Max-Age and Age headers.

As an illustration of the basic principle, consider the example given in Figure 1. A Speed Kit user requests example.css from the CDN which does not know this file and therefore requests it from our backend. The Baqend server assigns a TTL of 7 days and returns the CSS file to the CDN which again returns it to the requesting client. When receiving the file with the caching headers attached, the CDN knows that this file will expire in 7 days (SW-Max-Age=7days) and it literally counts the milliseconds until then (starting at Age=0). The user will get the same file with the same information and will therefore also keep it in its browser cache for the next 7 days. But when another Speed Kit user requests the same file two days later, the CDN can return it without contacting the Baqend servers. And to make sure that this requesting client does not use the file beyond its expiration time, the CDN provides Cache-Control headers with the initial TTL (SW-Max-Age=7days) and the time that has already passed (Age=2days). The client uses these numbers to compute the remaining TTL of 5 days.

This is pretty much how you would expect our caching headers to look. (Note: We leave out Surrogate-Control and other CDN-related headers for simplicity as we configure them by the same principle, anyways.)

Why it doesn’t work like that in reality

But the Baqend servers are pretty smart and do not just assign static cache lifetimes: Our backend learns from observed access patterns all the time and continuously decreases or increases assigned TTLs to maximize caching performance.

To propagate these TTL updates, we would ideally just modify the SW-Max-Age headers of the corresponding resources in the CDN. Unfortunately, though, our CDN provider Fastly does not support updating caching headers in revalidation requests — with one single exception: the Age header. To implement dynamic TTL updates, we therefor encode all changes in this particular header. The Cache-Control headers that we use for Speed Kit consequently look a bit different than caching headers you encounter elsewhere in the web.

Image for post
Image for post
Figure 2: To enable dynamic TTL updates for resources cached in the CDN, we only modify the Age header and set the Max-Age header to 1 year (fixed).

Figure 2 shows how we actually encode cache lifetimes. Assume the setting from above where a CDN node requests example.css from the Baqend server which once again assigns a TTL of 7 days. To encode the TTL with caching headers in a flexible way, the Baqend server specifies (1) that it will be valid for an entire year (SW-Max-Age=365days) and (2 ) that it is already almost 1 year old (Age=358days). This is not the actual truth, but it does not change anything about the effective cache lifetime: The stylesheet will also expire after 7 days in the CDN and when a Speed Kit client requests it after 2 days, this client will compute the same remaining TTL of 5 days as the client in the example above. But here is the trick: The Baqend server in Figure 2 can change the effective TTL for the browser by updating the Age header. For example, it can increase the TTL from 7 to 14 days by configuring Age=351days instead of Age=358days. This is not possible for the Baqend server in Figure 1, because (1) the SW-Max-Age=7days header is immutable and represents the upper limit for the TTL and because (2) the initial Age=0 cannot be decreased any further.

The thing with Browser-TTL …

The picture we painted above was already close to reality, but ignored the fact that there are different caches in the browser that we have to consider.

Image for post
Image for post
Figure 3: Browser caching is complex and we use different TTLs to gain better control over the individual caches.

As illustrated in Figure 3, most browser requests go through the Service Worker, so that Speed Kit can serve them from the SW cache. Speed Kit’s caching is different from normal browser caching, since Speed Kit has special logic in place to identify and avoid stale cache entries. But some browser requests do not go through the Service Worker — and for these requests, Speed Kit obviously cannot guarantee freshness: For example, some browsers simply take cached HTML files straight from the memory cache without even asking the Service Worker (looking at you, Safari). To make sure that this does not lead to stale content, we store HTML files with Browser-TTL=0 in all local caches apart from the SW cache: This guarantees that cached HTML files expire immediately and will therefore never be accessed by the browser — unless they are managed by Speed Kit. Avoiding the browser cache altogether is not an option here, because the browser (i.e. Speed Kit) cannot execute a revalidation request (and not even use caching headers) for an HTML file that is not present in the browser cache. Similarly, serving already-compiled JavaScript code from the memory cache is clearly more efficient than loading a JavaScript file from the SW cache and parsing it. But Speed Kit cannot control the memory cache and therefore cannot update JavaScript code once it is compiled and cached. To get the efficiency boost of the memory cache, we hence deliver all assets with Browser-TTL=30min. The compiled JavaScript can thus be reused in the current user session, but Speed Kit will check for staleness after 30 minutes or when the user starts a new session (e.g. on hard reload or when leaving and returning to the page).

We hope this clarifies Speed Kit’s interaction with the different caches in the browser. For details, take a look at our list of publications & videos or just drop us a line =)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK