cargo-semver-checks today and in 2023

December 23, 2022 semver rust retrospective

cargo-semver-checks ends 2022 with 40,000 downloads from crates.io, able to prevent 30 different kinds of semver issues, and having done so in real-world use cases. Inspired by Yoshua Wuyts' "Rust in 2023 (by Yosh)" post, here are my thoughts on cargo-semver-checks in 2022, and what I look forward to in 2023 and beyond.

Following semver in Rust is a perfect example of a workflow worth automating:

Important to get right, painful if done wrong: cargo requires all crates to follow semver, so breaking semver in one crate can have a ripple effect across the ecosystem.

But if done right, semver is completely invisible.
Countless complex rules: There are hundreds of ways to cause a breaking change, many of them non-obvious.
Code that violates semver doesn't look wrong: No code reviewer can be expected to reliably flag most of the semver issues, even assuming they are well-versed in all the semver rules. The evidence on this point is particularly overwhelming.

Some might say the solution is to "git gud". I deeply respect operational excellence, but this is not the way.

Civilization advances at the rate at which we develop robust abstractions. I am writing this on a computer I cannot build, under a blanket I cannot weave, having enjoyed a meal with ingredients I cannot grow. I dedicated ten years to math competitions,

and I can't even calculate a logarithm by hand! Can you?

Gatekeeping to only include people with a PhD in "Semver in Rust" won't cut it.

Yosh Wuyts quotes another Rust contributor as saying: "The job of an expert is to learn everything about a field there is to learn, and then distill it so that others don't have to."

I couldn't agree more!

2022: Rust + semver - tedium = ?

cargo-semver-checks was born in mid-July 2022, when I realized that building a semver linter boils down to only two things:

a list of machine-checkable rules, and
a system to check them.

At a high level, that's all cargo-semver-checks is: a checklist, and a for-loop over it.

As is usually the case:

I wasn't the first person to realize this. cargo-semver-checks isn't the first attempt at a semver linter for Rust.
cargo-semver-checks stands on the shoulders of giants: without rustdoc JSON and serde, the same work would have taken ten times as long.

The novel trick in cargo-semver-checks is that lint rules are written declaratively.

Given the need to have hundreds of different lints defined over an ever-changing data format,

this is a huge win.

But creating a good declarative query language is a much harder problem than semver! Generally one shouldn't replace an easier problem with a harder one. This is why linters rarely build their own query language.

Fortunately, I spent the last 7+ years of my career working on high-performance query languages for heterogeneous data, so I didn't need to start from scratch. Instead, I just plugged in my existing Trustfall query engine which is able to query any data source(s) no matter whether they are local files, remote APIs, or a terabyte-scale SQL cluster.

Thanks to Trustfall, each cargo-semver-checks lint is a type-checked structured query in Trustfall's GraphQL-like syntax. (More on this in future blog posts!) In practice, this means:

New lints are super easy to add: writing a new lint takes only 1-2 minutes. The vast majority of effort can then be spent on great test cases that reflect the diversity of use cases for each Rust language construct.
Lints are not tied to a specific rustdoc JSON format version. Even though the rustdoc JSON format changes frequently, the changes are absorbed by the Trustfall adapter for rustdoc and are completely invisible to the lints — an airtight abstraction layer.
cargo-semver-checks benefits from the performance and correctness guarantees of Trustfall, whose optimizations and test suite are far more intricate than would be feasible to write for a semver-checker alone. (If you'd like to hear more, tell me and I'll write more blog posts!)

All this allowed us to go from zero to 30 different semver lints in just five months.

We are ending 2022 on a particularly high note: four students have begun contributing to cargo-semver-checks as part of their Bachelors' theses! The pace of development has sped up dramatically thanks to their hard work, and the codebase is healthier than ever.

Looking ahead to 2023

At RustConf 2022 I had the pleasure of meeting several cargo team members, and we decided that the end goal for cargo-semver-checks is merging into cargo itself.

Another goal for cargo-semver-checks is adding even more lints to prevent more kinds of semver violations.

These goals are self-explanatory, and I won't dig into them further. Instead, I'll mention three of my personal favorite things I'd like to see in cargo-semver-checks in 2023.

Proactively discover and prevent false-positives

A false-positive error in cargo-semver-checks is when the tool incorrectly claims it found a semver violation. I consider false-positives extremely serious bugs

because they give the user incorrect advice, confusing them and slowing them down while also hurting the credibility of cargo-semver-checks itself.

Unfortunately, in 2022 our users reported multiple false-positive errors. I am grateful to everyone that spent their precious time helping debug problems that shouldn't have happened in the first place.

We have already begun strengthening the cargo-semver-checks test systems to discover and prevent future false-positives, so our users won't have to. In the process, we already discovered and fixed three previously-unknown false-positives.

In 2023, we plan to take a page from Rust's book: testing cargo-semver-checks on the most popular crates on crates.io as part of our release process. This would have a dual benefit: in addition to proactively discovering false-positives, it would also ensure cargo-semver-checks is ready to be adopted by those crates at their maintainers' convenience. And if we happen to discover more semver issues in the wild, that'll be a nice bonus!

Faster semver-checking via rustdoc caching

A cargo-semver-checks run consists of two steps: generating rustdoc JSON, and running lints over the generated JSON files.

The "run the lints" step is much faster

than the process of generating the rustdoc, which can take a few minutes in CI environments with low core counts like GitHub Actions.

In 2023, we'll implement rustdoc caching to limit how often the rustdoc has to be rebuilt.

We expect to cut rustdoc generation time in half: we'll still have to generate the current version's rustdoc, but we can avoid repeatedly rebuilding rustdoc for crate versions that are already published on crates.io.

Semver-check PRs, not just `cargo publish`

Currently, cargo-semver-checks is most ergonomic when used right before cargo publish: it checks whether the publish step with the specified version

would result in a semver-compliant release.

But wouldn't it be nice to know about breaking changes in a pull request before merging it and committing to a major version bump? Multiple projects have already begun running cargo-semver-checks like this, generally via custom scripts they've adapted specifically for that purpose.

In 2023, I hope we're able to make this an officially-supported mode of operation, complete with a GitHub Action. Bonus points if the Action reports semver issues as inline PR comments using the lints' span information!

Onwards!

I'm thrilled and humbled by the response that cargo-semver-checks has received in the Rust community. I've never been more excited about building the future with Rust, and I'm excited to see what 2023 has in store for cargo-semver-checks and the Rust ecosystem as a whole.

</div

cargo-semver-checks today and in 2023

cargo-semver-checks today and in 2023

2022: Rust + semver - tedium = ?

Looking ahead to 2023

Proactively discover and prevent false-positives

Faster semver-checking via rustdoc caching

Semver-check PRs, not just `cargo publish`

Onwards!

Recommend

Tracking Issue for io_error_more · Issue #86442 · rust-lang/rust · GitHub

我所亲历的知乎裁员：派系林立、内卷严重、新业务无望

Embedded Rust and Embassy: Timer Ultrasonic Distance Measurement

This Week in Fyrox #8

A taste of pavex, an upcoming Rust web framework

ChatGPT以一作身份期刊发文：探讨"神药"雷帕霉素在抗衰老上的应用 | 量子位

Give opaque types a better coherence error by oli-obk · Pull Request #106010 · r...

A Lesson for IT - Don't Be Southwest Airlines - SQLHA

2023品牌出海的最大底气是什么？

Add `IMPLIED_BOUNDS_ENTAILMENT` lint by compiler-errors · Pull Request #105575 ·...

About Joyk

cargo-semver-checks today and in 2023

cargo-semver-checks today and in 2023

2022: Rust + semver - tedium = ?

Looking ahead to 2023

Proactively discover and prevent false-positives

Faster semver-checking via rustdoc caching

Semver-check PRs, not just cargo publish

Onwards!

Recommend

About Joyk

Semver-check PRs, not just `cargo publish`