A principled approach to GraphQL query cost analysis

Why you should measure the cost of your GraphQL queries, and how you should do it.

Jun 25 ·11min read

URV7ZvU.png!web

Proposed applications of our query analysis. The client’s malicious query requests an exponentially large result from GitHub’s GraphQL API. At the time of our study, GitHub permitted the shown query, but halted its execution after it exceeded a time limit. Using our techniques, client-side query inspection can provide feedback during composition (see “Complexities” inset). Server-side query enforcement can reject queries and update rate limits based on provider-defined policies.

This is a brief for the research paper A Principled Approach to GraphQL Query Cost Analysis , published at ESEC/FSE 2020. Alan Cha led the work, with help from Erik Wittern, Guillaume Baudart, me, Louis Mandel, and Jim Laredo. Most of these authors are affiliated with IBM Research or IBM’s product teams, as part of IBM’s ongoing involvement with GraphQL.

This project is a follow-up to our previous work studying GraphQL schemas .

Summary

The state of practice: The landscape of Web APIs is evolving to meet new client requirements and to facilitate how providers fulfill them. The latest web API model is GraphQL , through which client queries express the data they want to retrieve or mutate, and servers respond with exactly those data or changes. GraphQL reduces network roundtrips and eliminates the transmission of unneeded data.

What’s the problem? GraphQL’s expressiveness is risky for service providers. GraphQL clients can succinctly request stupendous amounts of data, and responding to these queries can be costly or disrupt service availability. Recent empirical work has shown that many service providers are at risk. Practitioners lack a principled means of estimating and measuring the cost of the GraphQL queries they receive.

What did we do? Our static GraphQL query analysis measures a query’s complexity without executing it. Our analysis is provably correct based on our formal specification of GraphQL semantics. In contrast to existing static approaches, our analysis supports common GraphQL conventions that affect query complexity. Because our approach is static, it can be applied in a separate API management layer and used with arbitrary GraphQL backends.

Does our analysis work? We demonstrate our approach using a novel GraphQL query-response corpus for two commercial GraphQL APIs. Our query analysis consistently obtains upper complexity bounds, tight enough relative to the true response sizes to be actionable for service providers. In contrast, existing static GraphQL query analyses exhibit over-estimates and under-estimates because they fail to support GraphQL conventions.

What would this look like in practice? The first figure (above) shows our vision. Our cost analysis can be applied at the client side and at the server side. It can guide clients away from accidentally-costly queries. For middleware/servers, it can be used to detect and respond to potentially costly queries.

Background

If you are unfamiliar with GraphQL, take a look atmy earlier post and the references therein. There are plenty of other tutorials on the web, too.

Motivation

Many GraphQL services support super-linear GraphQL queries

Two previous empirical studies have described analyses on GraphQL schemas to understand whether they are at risk of “GraphQL Query Denial of Service” [1,2]. They wondered whether a user could send a small query and get back a really big response. They reported that this is possible for most GraphQL schemas; over 80% of the commercial or large-scale open-source schemas permitted exponentially-sized responses in the length of the input, and polynomially-sized responses were also common.

Few GraphQL services document their defenses

We manually studied the documentation for the 30 public GraphQL APIs listed by APIs.guru . Disturbingly, 25 of these 30 providers describe neither static nor dynamic query analysis to manage access and prevent misuse. A few GraphQL APIs have incorporated customized query and/or response analysis into their management approach (GitHub, Shopify, Yelp, Contentful, and TravelGateX), but these approaches have shortcomings. See the paper ($2.2 and $6.4.3) for details.

Existing GraphQL cost analyses are inadequate

It’s easy to measure the cost of a query — just execute it! But if the query is expensive, that can backfire. A cost analysis measures the cost of a query without fulling executing it. Like most cost analyses, there are two types of cost analyses for GraphQL queries: dynamic and static .

Researchers have proposed a dynamic analysis [1,3], which considers a query in the context of the data graph on which it will be executed. Through lightweight query resolution, it steps through a query to determine the number of objects involved without resolving them. This cost measure is accurate but expensive to obtain. It incurs additional runtime load, and potentially entails engineering costs to support cheap queries for object counting [3].

Practitioners have proposed static analyses [4,5,6], which calculate the worst-case response size for a query supposing a pathological data graph (i.e. a graph that will yield the largest possible response). Because a static analysis assumes the worst, it can efficiently provide an upper bound on a query’s complexity without interacting with the backend. The speed and generality of static query analysis make it an attractive approach for commercial GraphQL API providers.

Our static approach follows a similar paradigm as existing static analyses. We differ in several ways. Most notably:

We build our analysis on formal GraphQL semantics and prove the correctness of our query complexity estimates.
Our analysis can be configured to handle common schema conventions to produce better estimates.

Approach

Because our aim is to prove the correctness of our analysis, this section gets a bit mathematical. I’ve tried to capture the high-level ideas and leave the math to the paper.

GraphQL formalization

GraphQL is analogous to SQL, but in my opinion it makes the relationships between data a bit clearer. I will use (inline) notation to point out the SQL analogies.

A GraphQL data graph (database instance) represents the underlying data and relationships known by the service provider.
A GraphQL schema (database schema) defines the data types that clients can query, as well as possible operations on that data.
A GraphQL query (database query) requests a subset of the data that meets certain criteria, organized in terms of the relationships present in the data graph.
The evaluation of a GraphQL query can be thought of as applying successive filters to a virtual data object that initially corresponds to the complete data graph. These filters are defined by the query and are applied recursively following the query’s structure (see Figure 3 in the paper). The data graph is mapped onto backend storage using resolver functions , which resolve entities for each field of each type in the schema.
A GraphQL response consists of the entities matching the query, typed and organized according to the query’s filters.

Query complexity — Definitions of query cost

A GraphQL query describes the structure of the response data, and also dictates the resolver functions that must be invoked to satisfy it (which resolvers, in what order, how many times). We propose two query cost definitions:

Type complexity indicates the size of the response, measured by the number of fields in the response. You could simply count each entry, or you could weight the entries by the cost of representing them (e.g. the number of bytes needed to encode each type of field, like maximum string length or maximum integer width). This cost is paid by everyone — the service provider (to generate and marshal it), the network operator (to ship it), and the client (to unmarshal and process it).
Response complexity indicates the GraphQL service provider’s cost to generate the response. If the provider’s greatest cost is shipping data over the network, then the type complexity might suffice for them. But the provider’s computational cost may also depend on which resolver functions are invoked and how many times each. For example, perhaps some resolvers can be resolved through a cache, others by accessing cold storage, and still others are obtained through API composition from a third party. GraphQL’s backend-agnostic nature may lead to different costs for the provider on a per-resolver basis. Our notion of response complexity captures these kinds of costs.

Both of these costs can be calculated using weighted recursive sums under our formalization, which is similar to Facebook’s definition of GraphQL. But this definition is silent about how to handle lists. If we simply assume that all lists are infinitely long (or contain all entries of that type in the underlying data graph), we won’t get very useful cost estimates. Happily, the community follows two pagination conventions because returning infinitely long lists isn’t helpful for anybody.

Accounting for pagination

We consider two common [2] pagination conventions: the slicing pattern and the connections pattern. Slicing uses limit arguments to bound the size of the returned lists. Connections introduces a layer of indirection for more flexible pagination, with limit arguments applying to a cursor instead of to a list itself. These concepts are illustrated in the next figure.

naUBr2b.png!web

A GraphQL schema (left). A sample query (center) uses the two pagination conventions: slicing (red) and connections (green). The response is on the right.

Our cost analysis

We measure a query’s potential cost in terms of its type complexity and response complexity . Our analyses are essentially weighted recursive sums, with list sizes limited by the limit arguments dictated through the slicing and connections patterns.

Since our approach relies on conventions, the schema must be accompanied by a configuration explaining how it follows these conventions. This configuration accomplishes three things:

It labels the fields that contain limits, and the sub-fields (for the connections pattern) to which these limits apply.
It provides default limits in the event that a limit field has no argument supplied.
It optionally provides the weights to dictate how much each type and each resolver function costs. By default, a weight of 1 might be reasonable.

GuaranteesGiven a schema and a configuration, our analysis always yields an upper bound on a query’s cost in terms of its type complexity and response complexity. This guarantee assumes that all fields returning lists enforce their limit arguments and have a finite default limit. These are upper bounds because we assume that the underlying graph is pathological —that every list requested in the query will be fully populated.

Time and space complexity of our analysis

Time: Our analysis works its way “outwards-in” in a breadth-first manner. It visits each node of the query once, because inner layers of the query cannot increase the number of nodes at outer layers of the response. It therefore runs in time linear in the size of the query.
Space: The context of the recursive queries must be carried along in order to track the limits of sub-fields to handle the connections pattern. The convention (and our mplementation) only applies these limits to direct children, for a constant space cost. In general the space complexity matches the maximum degree of nesting within the query.

ImplementationWe implemented this approach, but have not yet persuaded IBM to open source it :-). We hope that open-source tools [4,5,6] will consider incorporating our ideas.

Evaluation

We investigated five questions. Our evaluation used two public GraphQL services, GitHub and Yelp.

Can the analysis (and configuration) be applied to real-world APIs?

Our analysis requires a GraphQL schema to be accompanied by a configuration of limit arguments, weights, and default limits.

We found it straightforward to develop a configuration for both schemas. Since GraphQL schemas follow naming conventions [2], we supported regular expressions to denote field names in our configuration language, and our configurations are thus far smaller than the corresponding schemas.

Yn2MBjr.png!web

Characteristics of the evaluated APIs and of our configurations.

Do we always obtain upper bounds?

Yes! We developed an open-source GraphQL query generator [7] and generated 5,000 query-response pairs for each API. The predicted and actual complexities are plotted here. We guarantee upper bounds, so our predicted costs always lie above the line y=x in the figure.

JrimYbM.png!web

Actual (response) complexities and predicted (query) complexities, using our analysis on the corpus. The axes are scaled based on the shape of the data. Each figure has text indicating the percentage of responses that were shown (the remainder exceed the x-axis), and for these, the percentage of the corresponding query complexities that were shown (the remainder exceed the y-axis).

Are these bounds tight enough to be useful?

This is a bit subjective. Our bounds may be over-estimates, but are as tight as possible with a static, data-agnostic approach. The results depend on the underlying data graph.

Data sparsity leads responses to be less complex than their worst-case potential. In the preceding figure our upper bound is close or equal to the actual type and resolve complexities of many queries —this can be seen in the high density of queries near the diagonals. Our bounds grow looser for more-complex queries. This follows intuition about the underlying graph: larger, more nested queries may not be satisfiable by an API’s real data.

We quantify this in the paper (see Table 2).

Is our analysis cheap enough to be applicable in middleware?

Yes! Beyond the functional correctness of our analysis, we assessed its runtime cost to see if it can be incorporated into existing request-response flows, e.g. in a GraphQL client or an API management middleware.

We measured costs on a 2017 MacBook Pro (8-core Intel i7 processor, 16 GB of memory). As predicted, our analysis runs in linear time as a function

of the query size. Nearly all queries could be analyzed in under 5 milliseconds.

How does our approach compare to other static analyses?

We compared against open-source and closed-source analyses.

Open sourceWe compared our approach against those of three open-source libraries [4,5,6]. The following figure summarizes the results. Note that libA produces wild over-estimates, while libB and libC follow identical approaches that are susceptible to under-estimates.

NFBvm27.png!web

Actual and predicted response sizes based on type complexity, from our analysis and the libraries on the GitHub data. libB and libC produce identical values under our configuration. All static approaches over-estimate due to data sparsity. The libraries have sources of over-estimation beyond our own. libB and libC also under-estimate (below the diagonal).

The shortcomings of these libraries can be tied to their mishandling of GraphQL pagination conventions. Problematically, these libraries do not always permit users to specify the lengths of lists, leading to over- and under-estimates.

libA is liable to over-estimates. It only permits a global maximum on list lengths. Since these API providers have large variation in default lengths (and support limit arguments), the resulting upper bounds may be large overestimates.
libB and libC may make under-estimates. They treat unpaginated lists as always returning only one element. They also cannot support indirect limit arguments, and thus cannot handle the connections pattern.

Closed source

We considered the static analyses of GitHub and Yelp, which are documented although not open source.

GitHub At the time of our study, GitHub shared one of the limitations of libB and libC: it did not properly handle the size of unpaginated lists. The first figure illustrates this issue using the unpaginated relatedTopics field. We reported this issue and they have resolved it. Our analysis remains more flexible than their repaired analysis.
Yelp Yelp follows a simple rule: fields can be nested to at most four levels. Relatively large responses can still be obtained by nesting large lists. Our more accurate analysis has the same time complexity as theirs (both process the query in a single pass).

Conclusions

GraphQL is an emerging Web API model. Its flexibility can benefit clients, servers, and network operators.But its flexibility is also a threat: GraphQL queries can be exponentially complex, with implications for service providers including rate limiting and denial of service.

The fundamental requirement for service providers is a cheap, accurate way to estimate the cost of a query. We have proposed a principled, provably-correct approach to address this challenge. With proper configuration, our analysis offers tight upper bounds, low runtime overhead, and independence from backend implementation details.

A principled approach to GraphQL query cost analysis