Code-first vs. schema-first development in GraphQL - LogRocket Blog - JOYK Joy of Geek, Geek News, Link all geek

Code-first vs. schema-first development in GraphQL - LogRocket Blog

February 26, 2020

Code-first vs. schema-first development in GraphQL - LogRocket Blog

This article is part of an ongoing series on conceptualizing, designing, and implementing a GraphQL server. The previous articles from the series are:

The GraphQL schema defines the contracts for a GraphQL service by exporting the set of types, fields, and mutations that can be executed against the service. When creating a GraphQL service, we can decide to have the schema be the source of truth and let all our implementation code match its definitions, or we can have our code be the source of truth and have the schema be an artifact generated from the code.

In either case, we will have a fully functioning GraphQL service, but depending on which approach we use, we may be able to implement more or less features, more or less easily, down the road. The two approaches are called, respectively, schema-first and code-first.

In this article, we will explore these two approaches to creating our GraphQL service, review the advantages/disadvantages of each, and, finally, take a stance and decide which is possibly the better choice. Let’s start!

What is schema-first?

Schema-first indicates that we first define the schema for the GraphQL service and then we implement the code by matching the definitions in the schema. To code the schema, we use the Schema Definition Language (SDL), a syntax created to represent the GraphQL data model. Because of this, this approach may also be called SDL-first.

The SDL has contributed to the widespread popularity of GraphQL. Even though the first implementations and utilities of GraphQL were in JavaScript (graphql-js in 2015, graphql-tools in 2016), the SDL was designed to be language-agnostic. Implementations of GraphQL servers in different languages — including those for Node.js, Ruby, Python, Scala, PHP, and others — can read from/write to SDL.

Let’s see what the SDL looks like (I won’t go into detail since there are many tutorials about it, like this one). Defining a type Film with properties id and title in SDL looks like this:

type Film { id: Int title: String!  
}

The code is almost self-descriptive: the title is a property of type string (written with title case: String), and it must always have a value, represented through the character !.

Since GraphQL is all about querying relationships in a graph (that’s where the name GraphQL comes from: Graph Query Language), we can define relationships across different types, like this:

type Film { id: Int title: String! director: Person! actors(limit: Int = 10): [Person]  
} type Person { id: Int name: String!
}

Since a film can have many actors and actresses, the property actors is defined as a list type through characters [], and we can limit how many elements to return in this list through the custom argument limit, with a default value of 10.

Already in this simple example we can appreciate the SDL’s biggest advantage: it is very simple and concise and, as such, very easy to understand, both for technical and nontechnical people. As a consequence, the SDL can act as a communication tool through which people across teams can collaborate in the creation of the organization’s data model.

Similarly, we can already notice what drawbacks the SDL brings. The first is that the SDL doesn’t include the resolvers, i.e., the actual code that will compute the field’s value. Hence, the SDL cannot all on its own be the single source of truth since it’s not complete. At most, SDL just brings a clean separation between schema definition and resolution, as in this code (which uses graphql-tools):

const { makeExecutableSchema } = require('graphql-tools') // Schema definition  
const typeDefs = `  
type Film { id: Int title: String! director: Person! actors(limit: Int = 10): [Person]  
} type Person { id: Int name: String!
}
` // Schema resolution
const resolvers = { Film: { director: (film, args) => fetchUserById(film.directorID), actors: (film, args) => findUsers(film.actorIDs, {limit: args.limit}), }  
} const schema = makeExecutableSchema({ typeDefs, resolvers,
})

The second drawback is that the resolver code must match exactly the definition in the SDL (e.g., a property with type String! must be String!, not String or Int!). Hence, it is not DRY (Don’t Repeat Yourself); there will be information duplicated across the codebase, which must be kept in sync.

What is code-first?

In the code-first approach, we start by coding the resolvers, and then, from code as a single source of truth, we have the schema generated as an artifact. Thus, we still have a schema, but instead of being manually created, it is created through running a script. This approach may also be called resolver-first.

Understanding the schema just from looking at code is not as easy as looking at the SDL definition. Hence, several GraphQL server implementations have attempted to have their code mirror, as much as possible, the SDL.

For instance, in Nexus (a code-first GraphQL server in JavaScript/TypeScript), the name of the method defining a field already expresses the type of the response (in the code below: t.int, t.string, t.field for relationships to an object type, and t.list.field denoting a list of relationships to an object type). Implementing the same schema as above looks like this:

const Film = objectType({ name: "Film", definition(t) { t.int("id", { description: "Id of the film" }); t.string("title", { description: "Title of the film" }); t.field("director", { type: Person, resolve(root, args, ctx) { return ctx.getFilm(root.id).director(); }, }); t.list.field("actors", { type: Person, nullable: true, args: { limit: intArg({ required: false, default: 10, description: "Limit the number of actors/actresses" }), }, resolve(root, args, ctx) { return ctx.getFilm(root.id).actors(args.limit); }, }); }  
}); const Person = objectType({ name: "Person", definition(t) { t.int("id"); t.string("name"); }
});

Fields to be resolved to a property of the same name in the object (such as fields id and title from type Film, which are resolved as properties id and title from the film object) can have their resolver functions omitted. If the data model mostly consists of simple attributes like these, without any custom logic to be resolved, the code then quite resembles the SDL, as is the case for type Person in the code above.

Nexus has put emphasis in being declarative and readable. As such, even though less understandable than reading the SDL, it is still good enough that it can be used for communicating the intent of the schema in addition to how it will be resolved.

Other GraphQL servers also emulate the SDL in their definitions, but they may have less optimal results. For instance, graphql-php implements the schema above like this:

$film = new ObjectType([ 'name' => 'Film', 'fields' => [ 'id' => Type::int(), 'title' => [ 'type' => Type::string(), 'description' => 'Film's title' ], 'director' => [ 'type' => Type::nonNull($personType), 'description' => 'Film director', 'resolve' => function(Film $film) { return DataSource::findPerson($film->directorID); } ], 'actors' => [ 'type' => Type::listOf($personType), 'description' => 'List of actors/actresses in the film', 'args' => [ 'limit' => [ 'type' => Type::int(), 'description' => 'Limit the number of actors returned', 'defaultValue' => 10 ] ], 'resolve' => function(Film $film, $args) { return DataSource::findActors($film->id, $args['limit']); } ] ]  
]); $personType = new ObjectType([ 'name' => 'Person', 'description' => 'A person', 'fields' => [ 'id' => Type::int(), 'name' => [ 'type' => Type::string(), 'description' => 'Person's name' ], ]
]);

In this case, the definition is more jarred, with several indentations cutting the flow of legibility and diverse syntax colors getting in the way. Hence, even though the schema is still understandable, it is not so for everyone, and its utility as a communication tool across teams decreases.

From these examples, we can appreciate the code-first schema’s biggest advantages and liabilities: it can effectively be the single source of truth of the data model since it contains both the schema definitions and the code to resolve them, but at the expense of being less understandable.

Please note that we can still use a schema, written in SDL, to communicate the data model with our co-workers: the schema can be generated as an artifact from the code by running some script. Moreover, the generated schema can be consistently formatted to help its legibility, such as ordering the types alphabetically.

Generating the schema can be done manually whenever needed. It can also be automated as part of our continuous integration process — triggering it whenever our codebase is tagged, for instance — and committing the newly created file to a special repository. This is the strategy employed by GitHub for its public GraphQL schema.

Shopping around for schema-first or code-first GraphQL servers

We can choose which of these two approaches to use for creating our GraphQL service. However, the decision may have to be made at the very beginning since it may influence which GraphQL server implementation we use; most implementations will be opinionated on using either one or the other one, very seldom both.

Moreover, this decision may also affect which language to use. Some languages offer the two options, such as JavaScript or TypeScript through Apollo server (schema-first) and Nexus (code-first), Python through Ariadne (schema-first) and Graphene (code-first), and .NET through GraphQL for .NET offering both alternatives.

However, some other languages have only one approach, such as Rust, for which there is only one GraphQL server: Juniper, which supports the code-first approach only.

Finally, it may also affect which framework for some language we can use. For PHP, for instance, Laravel can choose from both options through Lighthouse (schema-first) and Laravel GraphQL (code-first), but WordPress only has solutions offering the code-first approach, such as WPGraphQL and GraphQL by PoP.

Links to all the different GraphQL servers, for all languages, can be found here.

Schema-first vs. code-first, as promoted by different implementers

So far, we have reviewed how the two approaches compare regarding legibility (schema-first is better) and their ability to become a single source of truth without duplicated code (code-first is better). These are not the only characteristics to compare, however.

Next, let’s see how different implementers justify using either approach for their own GraphQL server implementations.

Why schema-first is better

In “Schema-First GraphQL: The Road Less Travelled,” Mirumee’s Jakub Draganek promotes the schema-first approach used by GraphQL server Ariadne.

Jakub mentions that the advantages of having the schema act as the common contract between the client and server sides include the following:

It resembles doing test-driven development (TDD) because developers must consider the different use cases and the user convenience well in advance of starting to code the solution, and the final result may carry a more modular and maintainable design.
It follows the dependency inversion principle (DIP), which makes the solution more abstract and less tied to dependencies.
It enables the client- and server-side developers to work at the same time since client-side developers need not wait for work on the backend to be finished first. The schema makes it possible to provide mock data for the API, so the client can be tested independently of having the server-side GraphQL service ready.

Why code-first is better

In “The Problems of ‘Schema-First’ GraphQL Server Development” (and in his presentation at GraphQL Conf 2019), Prisma’s Nikolas Burk advances the code-first approach offered by the GraphQL server Nexus.

According to Nikolas, code-first is better because there are no exceptional features supported by schema-first that code-first does not support. At the same time, it requires less effort to use because, in contrast to the schema-first approach, it doesn’t depend on an excessive amount of tooling. Schema-first forces developers to use a myriad of additional tools, bogging down their experience.

The following is a list of challenges that schema-first requires some tool to solve, but not in code-first:

Inconsistencies between schema definition and resolvers: Ensuring that the schema definition is in sync with the resolvers at all times.
Modularization of GraphQL schemas: Organizing GraphQL type definitions into several files.
Redundancy in schema definitions (code reuse): Reusing SDL definitions may involve a lot of boilerplate and repeated code.
IDE support and developer experience: Leveraging the GraphQL types in editor workflows to benefit from features like autocompletion and build-time error checks for SDL code.
Composing GraphQL schemas: Composing a number of existing (and distributed) schemas into a single schema.

I’d like to add that there are problems in schema-first that do not take any effort to fix in code-first, such as this issue from the GraphQL spec, which concerns localizing the descriptions in the schema, so as to make it usable for those who speak different languages.

Schema-first vs. code-first from my own experience

Hands down, I’m all for the code-first approach. Let me explain my position.

I have built a GraphQL server in PHP, GraphQL by PoP, which supports the code-first approach only. The lack of schema-first is due to the unavailability of a suitable SDL parser in PHP, and implementing one on my own would take a precious amount of effort, which I’d rather spend on some other task.

However, even if I could, adding schema-first support to my GraphQL server may not make much sense because the schema is not static, but dynamic: it can morph as needed, regulated through code. A dynamic schema provides all the benefits described below, which I seriously doubt could be provided through the schema-first approach.

The source of truth for the schema can be a superset of the one required by GraphQL (the equivalent GraphQL schema can be seen through GraphiQL and Voyager). The additional properties (such as the global fields, global connections, global directives, and persisted fragments) can already be used in our API without having to wait for them to be added to the GraphQL spec, if ever.

Because the source of truth is not tied to the schema, then we can generate any schema for any other system, too; GraphQL is just one of the targets. For instance, it can generate a JSON Schema for a REST service from the same source of truth.

The API can be public/private at the same time, depending on whether the user is logged in and on the logged-in user roles, or offer more or less fields depending on some other property, such as whether the user has paid for a pro membership.

Types do not know in advance what fields they will resolve. Instead, field resolvers attach themselves to type resolvers using the publish-subscribe pattern, and field resolvers can override other field resolvers. This feature makes the API very extensible, allowing us to have a general code for our API and customize it at the application level for a specific client or project.

A field can be processed by not just one, but many field resolvers: each field resolver in the chain can decide, on runtime, whether to process the field based on some property, or pass it along the chain.

For instance, a special field resolver may be used only if a field argument "source: testing" is passed, enabling it to be tested in a few sites in production before the general release. The same strategy also enables you to provide quick bug fixes for a specific client or environment without running the risk of unintended side-effects everywhere else.

The combination of the publish-subscribe pattern and the field resolver-chaining features described above also makes the implementation of the server natively decentralized/federated. Indeed, there is no need to implement federation because a federated or non-federated API is coded the same way, and there is no need for different teams to establish special conventions for managing the data model, as happens with Apollo Federation.

Types and interfaces can be automatically namespaced to avoid collisions from third parties.

I will be describing several of these strategies and their implementation in upcoming articles in this series. I hope that, then, you will be convinced of the superiority of the code-first approach.

Conclusion

In this article we analyzed and compared the two approaches to creating a GraphQL service: schema-first and code-first. Even though they both have positive and negative characteristics, based on my own experience, I can claim that code-first is the better option since it allows us to implement features that are not possible otherwise.

I am not alone in my feelings. In this tweet, Prisma’s Johannes Schickling makes the prediction that, starting from 2020, the code-first approach will become more popular than schema-first.

Prisma has some experience creating tools to improve the inherent limitations from schema-first, and it has created Nexus in order to directly avoid these problems by architectural design through the code-first approach. As such, I believe that Johannes surely knows what he is talking about, and I would heed his prediction.

In any case, using schema-first or code-first is still a decision that must be based on each project’s specific conditions. If your team already knows how to handle it, or your schema will never grow from a limited size, or you only need to launch something quick today and you don’t need to worry about the future, then schema-first is ideal.

However, schema-first may produce pain points down the road if you need to scale up the schema, federate it, have autonomous teams working on it, or customize it for a specific client or project. If you believe that your schema may eventually grow in complexity or size, then code-first is the way to go.

While GraphQL has some features for debugging requests and responses, making sure GraphQL reliably serves resources to your production app is where things get tougher. If you’re interested in ensuring network requests to the backend or third party services are successful, try LogRocket.https://logrocket.com/signup/

LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic GraphQL requests to quickly understand the root cause. In addition, you can track Apollo client state and inspect GraphQL queries' key-value pairs.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.

Code-first vs. schema-first development in GraphQL - LogRocket Blog