GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generation

May 04, 2022 5 min read

Nav Technologies has created an open-source schema definition and code generator that uses GraphQL syntax to define events and message formats. GraphQL was chosen for its expressiveness and familiarity among developers, but it is only used for its syntax; the Nav Schema Architecture (NSA) does not use the GraphQL runtime.

Using GraphQL queries allows a contract developer to describe both the data model and message format at the same time, rather than needing two sets of semantics. This is useful when an attribute may be optional on the underlying data model, but required when that model is used in a specific message.

The primary purpose of NSA is to generate code and schemas in multiple languages, all based on the root definition using GraphQL. The outputs can be other schema languages, such as protobuf or JSON Schema, or code, with Go, Ruby, and Python currently supported.

The benefit of a common schema comes from the ability to easily disseminate its implementation across multiple teams and services. A build pipeline will watch for schema changes on a feature branch, then launch a secondary pipeline to generate the output for all target languages. That output is then committed back to the feature branch, where a developer can review the changes before merging to the main branch. All relevant, language-specific output packages are rebuilt, versioned and tagged.

InfoQ met with some of the developers at Nav of the project to better understand the problems they were trying to solve and the benefits they've seen from this approach.

InfoQ: Contract-first development is not a new idea, but we'll more often see OpenAPI and JSON Schema used to define the contract. What drove your decision to use GraphQL syntax as the primary source of truth for the contract, and then derive contracts from it?

Nav development team: The differentiator between GraphQL and other systems like OpenAPI and JSON Schema is that a successful system requires a means to define both data models and messages, and only GraphQL contains within it solutions for both of these hemispheres of the problem. GraphQL is a payload description language that solves the problem of defining payloads with validation rules and message schemas in a single DSL.

GraphQL is simpler to work with compared to JSON Schema, doesn't require too much documentation, and facilitates communication between teams. JSON schema is an output target of NSA. There isn’t necessarily a code generator from JSON schema to all of our supported target languages (for example, JsonSchema to Protobuf). With NSA, we can generate multiple schema definitions (JSON schema, protobuf) from a single CIM in addition to the language specific message structures. NSA can be used like a schema for other schema definitions, providing model-to-model transformations.

GraphQL has rich syntax support for complex validation rules onto message payloads, e.g. data formats, ranges of allowable values, regex matching, and required attributes. Validation at the data model and message schema layer (entrypoint), in a polyglot microservice architecture, allows us to identify glitches during testing or deployment, before services process and store the data.

InfoQ: Is your system architecture primarily using async messaging, or is it request-response? Would NSA be applicable to either approach?

Nav development team: We primarily use it asynchronously to send messages via AWS EventBridge to SQS queues, from which messages are retrieved and read by services. It can be used for request-response, e.g. using the protobuf output for grpc services, and NSA is equally applicable to either approach.

There are no references to endpoints, subscribers, or publishers in NSA. Output code from NSA is consumed by an adapter which concerns itself with the transmission protocol.

Nav Schema Architecture’s primary concern is validation of payloads, rather than endpoint management. It is independent of the method of transmission, so long as the producers and consumers can run the validation code before transmitting. This is the biggest differentiator from other schema systems in the industry.

InfoQ: What other designs had you considered, and how did you decide this was the best approach? Specifically, did you look at using OpenAPI/AsyncAPI or protobuf as the syntax for code generation?

Nav development team: Nav uses protobuf extensively in other contexts, and Protobuf message definitions are an output target of NSA. AsyncAPI specifications can have any message payloads, so NSA can have a generator that outputs definitions to be used as an AsyncAPI message schema.

GraphQL has high developer familiarity and a simple learning curve that is well suited as a definition language for internal polyglot microservice architecture; as opposed to a B2B context where OpenAPI provides more complete documentation to clients that are completely unfamiliar with an API offering.

AsyncAPI attempts to cover the transports which is unnecessary for our use of EventBridge. AsyncAPI complicates the issue by coupling validation with transmission/transports. We wanted better separation of concerns.

InfoQ: Are the GraphQL schemas stored in separate repos, or are they with one of the producers or consumers?

Nav development team: The GraphQL schemas are in a single repository shared across the organization. This repo also contains the processor and the subsequently generated code (though this need not be the case). Because the generated code concerns itself only with message validation, it is used as a dependency by many libraries and applications within Nav (be they producer or consumer or a simple documentation tool).

GraphQL type extensions allow for scaling across larger organizations that want to work on different parts of the same entities. Merging schemas in GraphQL is relatively easy, so schemas can be separated by context and later merged with some additional processing steps.

If an organization wanted to separate the project into multiple repos by responsibility, one or more repos could contain solely GraphQL that is eventually merged into a single schema as parser input. Another repo could house the parser itself, which could connect with one or many codegen repos as submodules. A fourth layer of repos could contain the generated code, one repo per language, with all of the necessary validation, testing, and packaging logic. Finally, these packages could be consumed by an organization’s client libraries, containing no logic around transmission mechanisms themselves.

Developers from Nav who participated in this discussion included Daniel Zemichael, Michal Scienski, Jovon McCloud, and Jeff Warner.

About the Author

Thomas Betts

Thomas Betts is the Lead Editor for Architecture and Design at InfoQ, a co-host of the InfoQ Podcast, and a Senior Principal Software Architect at Blackbaud.

For over two decades, his focus has always been on providing software solutions that delight his customers. He has worked in a variety of industries, including retail, finance, health care, defense and travel.

Thomas lives in Denver with his wife and son, and they love hiking and otherwise exploring beautiful Colorado.

GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generatio...

GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generation

About the Author

Thomas Betts

Recommend

Amazon Managed Blockchain

You Can Use Multiple Slurp Juices On An Ape

Cloud Object Storage – Amazon S3 – Amazon Web Services

Testing React Native apps with Jest

微软介绍Windows 11任务管理器新特性并解释为何弃用标签设计-51CTO.COM

AWS Snow Family

AWS IoT Analytics

Stay up to date with latest innovations and releases for your SAP S/4HANA Cloud

AMD posts record $5.9 billion first-quarter revenue, despite PC market slowdown...

Advertising & Marketing Technology on AWS

About Joyk