6

Adding Response Metadata to Cache API Explainer by Aaron Gustafson and Jungkee S...

 4 years ago
source link: https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/CacheAPIResponseMetadata/explainer.md
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Adding Response Metadata to Cache API Explainer

Authors: Aaron Gustafson , Jungkee Song

Introduction

Currently, many sites are using Service Workers and the Cache API to store absolutely everything they can, often relying on the browser to evict content when there is storage pressure. Sites that want to be more proactive in managing their caches are limited to pruning responses solely based on when they were cached, typically iterating over the keys of a given cache and deleting the first item until they reduce the quantity of cached responses below a predefined threshold. That might be a good strategy if all responses were equal, but they aren’t. Some payloads are large, others are small. Some resources are computationaly-intensive to produce, others are static files served by a CDN. Some are only accessed once, others are accessed with great frequency, often depending on both the site itself and the particular user interacting with it.

Having a little more clarity on the shape and form of cached responses would enable developers to be more intentional about what they do with cached content.

Caching response metadata

It would be quite useful to have access to metadata about each cached item. Much of the useful metadata is managed by the user agent, for example:

Response
Response
Response
Response

It’s worth noting that the size on disk could be inferred from the Respose ’s "Content-Length" header (or "Transfer-Encoding"), but these methods are unreliable. A better approach is to have the browser actually fill in this data so it is available even if these headers are ommitted.

Taken together, this Explainer proposes adding the following read-only attributes to the Response class, as defined by the Fetch API :

  • cachedAt : timestamp added when the Response is cached
  • lastAccessedAt : timestamp updated each time a cached Response is returned by the Service Worker ( not when the cached item is inspected)
  • retrievalCount : a number that increments each time each time a cached Response is returned by the Service Worker
  • size : computed disk size of the Request/Response object pair

If adding these directly to the Response is not feasible, these could be added as properties of an object assigned to a single key, such as Response.cacheData . Ideally, however, these could be added to a subclass of the Response inferface used in the Cache API context specifically:

interface CachedResponse : Response {
  readonly attribute DOMTimeStamp cachedAt;
  readonly attribute DOMTimeStamp lastAccessedAt;
  readonly attribute unsigned long long retrievalCount;
  readonly attribute unsigned long long size;
};

These keys would only be available for CachedResponse s from the same-origin, obtained via a GET Request . All other cached CachedResponse s would return zero (0) values for these properties so as not to require developers to include additional code to do value checking or provide alternate code paths.

Goal

Enable developers to access a richer set of information about their cached Requests in order to help them make better decisions with respect to purging content.

Non-goals

Provide developers with more convenient access to other metadata that could be obtained by reading the Response headers (e.g., "Server-Timing").

Use Cases

Websites that want to limit their own cache often purge items based on when they were cached (inferred by the order in which they are added to the cache, reflected in cache.keys ):

function trimCache(cacheName, maxItems) {
  // open the cache
  caches.open(cacheName)
    .then( cache => {
      // get the keys and count them
      cache.keys()
      .then(keys => {
        // Do we have more than we should?
        if (keys.length > maxItems) {
          // delete the oldest item and run trim again
          cache.delete(keys[0])
            .then( () => {
              trimCache(cacheName, maxItems)
            });
        }
      });
    });
}

This assumes all cached Response s are equal in terms of both usefulness to the end-user and how much disk space they are occupying. Being able to retrieve information such as when they were last accessed and how much space they occupy on disk could enable them to make smarter decisions around cache eviction.

Examples

Example 1:Remove content that is very large (> 5 MB) and has not been accessed in more than 30 days:

async function trimCache( cacheName ) {
  const large = 5000000; // 5 MB
  const old = Date.now() - 2592000000 // 30 days ago

  const cache = await caches.open( cacheName );
  
  // Collect Request objects
  for (const request of await cache.keys()) {
    cache
      .match(request)
      .then(response => {
        if ( response.size > large &&
             response.lastAccessedAt < old )
        {
          cache.delete( request );
        }
      });
  }
}

Example 2:Remove content accessed fewer than 5 times, with the last time being more than 90 days ago:

async function trimCache( cacheName ) {
  
  const old = Date.now() - 7776000000 // 90 days ago

  const cache = await caches.open( cacheName );  
  for (const request of await cache.keys()) {
    cache
      .match(request)
      .then(response => {
        if ( response.retrievalCount < 5 &&
             response.lastAccessedAt < old  )
        {
          cache.delete( request );
        }
      });
  }
}

Example 3:Remove content that only got used when it was cached, over 180 days ago:

async function trimCache( cacheName ) {
  
  const old = Date.now() - 15552000000 // 180 days ago

  const cache = await caches.open( cacheName );  
  for (const request of await cache.keys()) {
    cache
      .match(request)
      .then(response => {
        let cached_at = new Date( response.cachedAt ),
            accessed_at = new Date( response.lastAccessedAt );
        // If year, month, and day match + old
        if ( cached_at.getFullYear() === accessed_at.getFullYear() &&
             cached_at.getMonth() === accessed_at.getMonth() &&
             cached_at.getDate() === accessed_at.getDate() &&
             response.cachedAt < old )
        {
          cache.delete( request );
        }
      });
  }
}

Privacy Considerations

It is possible that the timestamps stored in the cachedAt and lastAccessedAt properties could be used for fingerprinting, especially when used in conjunction with retrievalCount . For this reason, user agents should never return an exact millisecond match of the timestamp . Developers do not need millisecond-level accuracy in these values and, in most cases, will only really care about the year, month, and day. As such, a user agent should prevent the use of these fields as a fingerprinting vector by always returning timestamp values that report the date accurately, but always report the time as one second past midnight . If the user agent chooses to expose the true value of these properties in Developer Tooling, which could be useful, they should store the correct value, but alter the value provided via the getter.

It is possible that retrievalCount could be used for fingerprinting (leveraging the User Behavior fingerprinting vector ), but the retrievalCount property does not introduce any fingerprinting surface not already exposed via the Cache API.

Limiting these new properties to same-origin Response s eliminates the possibility that the first-party website could use this new functionality to snoop on a user’s engagement with third-party services. User Agents would be instructed to report zero values for all of these properties on any opaque Response s.

Open Questions

Response

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK