Solving Flaky Tests in RSpec

Increasing the reliability of Flexport’s test suite with new Quarantine gem.

Flaky tests are an unavoidable nuisance at every company, and Flexport is no exception. With the recent growth of our Engineering Team, our monolith’s test suite has ballooned from 14k to 16k tests in just the past four months. Throughout all of this growth, our master branch success rate has dropped to 70% with flaky tests responsible for 50% of all failures. There are many reasons why these tests exhibit flaky behaviors ranging from non-deterministic functions, such as expect(db_query).to eq([1, 2, 3]) to having shared mutable data between tests. With the enormous influx of code being committed every day, our tests suite have quickly grow out of control and we needed a solution.

solving-flaky-tests-in-rspec-9ceadedeaf0e

Illustrations by Bailey McGinn

Why are flaky tests a problem?

Flaky tests caused many of our engineers to lose faith in the reliability of our test suite. Many engineers were unsure if tests were failing because of their changes or because of flaky tests, resulting in countless hours rebuilding and debugging unrelated code to their feature. This was an enormous source of frustration for our engineers and directly impacted our engineering velocity.

Another headache due to flaky tests were our abysmal master pass-rates. Each failing master build required exhausting manual investigation and diagnosis from the infrastructure team, wasting valuable development time. In some cases, engineers might have dismissed a failing test in a build as flaky and disabled it only to later realize that it was a legitimate failure which went on to cause production errors. This may have become a common occurrence since it is only human nature to ignore alarms when there is a history of false signals coming from a system. (Google Testing Blog, 2016). As a result, flaky tests have serious implications in terms of time and resources and can directly impact production reliability.

Flexport’s solution

To combat flaky tests, we have created Quarantine, which is an open-source Ruby gem we use to maintain a list of flaky tests that would be skipped during runtime. Before test execution, Quarantine will download a list of all flaky tests and prevent them from being run. During test execution, Quarantine will automatically retry failing tests. If a test passes after previously failing in the same build, it will be marked as a flaky test and will be added to the list of flaky, quarantined tests. The gem aims to automate the flaky test workflow and create a quicker feedback loop to maintain a pristine test suite state.

How has Quarantine changed our test suite?

Overall, we believe Quarantine has provided a positive impact to our development lifecycle. So far, we have quarantined over 60 flaky tests, and our master build success rates have improved from 70% to 95%. Metrics aside, quarantine has had the following impact on engineering velocity and developer experience.

Now when tests fail during CI builds, engineers can be fairly certain it’s their code causing breaking changes and not flaky tests
With less noise in failing builds, it is much quicker to investigate failing master builds, which helps us recognize legitimate failures
Quicker turn-around time between flaky test detection, disabling, and resolution
A centralized location to view flaky tests instead of random xit across the code base

Master build statistics (April 8 to April 15)

Managing Quarantine and Possible Pitfalls

At first, the notion of automatically skipping a set of tests may seem scary. If mismanaged, quarantining tests can potentially lead to disabling important tests, and in turn, fatal production errors. To mitigate this risk, you can disable Quarantine in development branches to ensure builds get the original code coverage. At Flexport, we are considering a variety of options ranging from giving tests a grace period before being quarantined to adding immediate alerting to the team owning the flaky test. In the end, it is important to evaluate the maturity of your test suite and determine what is more important for your code base: test suite stability or code coverage.

Looking Forward

Currently, we are in the process of adding a variety of extra tools to help make the Quarantine experience all the more seamless. This includes:

Automatic Jira ticket creation
Slack alerting
Un-quarantining test on Jira ticket completion
Greater configurability on quarantine options

We are also curious to hear how other teams have approached flaky tests. Feel free to reach out here or on Github with your ideas and feedback on our approach. Similarly, if you would like Quarantine to support your Ruby stack, or want to contribute to the gem, don’t be shy and fork our repository!

Solving Flaky Tests in RSpec

Solving Flaky Tests in RSpec

Increasing the reliability of Flexport’s test suite with new Quarantine gem.

Why are flaky tests a problem?

Flexport’s solution

How has Quarantine changed our test suite?

Managing Quarantine and Possible Pitfalls

Looking Forward

Recommend

从NFT、DeFi领域全景式看区块链用户行为

德通电气入选工信部第三批专精特新“小巨人”企业名单

SSL基础知识及Nginx/Tomcat配置SSL

津亚电子成功入围工信部第三批专精特新企业名单

Semrush SEO Reality Show! Top Agency Secrets Revealed

智行者荣获国家级专精特新“小巨人”企业称号

How to Talk Tech With Your Execs

Social Media Management: Monitoring Your Social Pages & Interactions

北交新能成为第三批专精特新“小巨人”企业

Solana代币SOL取代狗狗币成为市值第七的加密货币

About Joyk