Trainline’s journey to speed up the customer experience: getting set up

The Trainline booking flow is a Node.JS/React web app which allows our customers to search for and book train and coach tickets. Over time, we’ve tested and launched many new features such as more advanced real-time information, logged-in customer profiles and tools to help our customers find the best price, but this has increased the size of the application code we ship to our customers.

During the last 18–24 months, we’ve made concerted efforts to improve our web performance in particular areas, largely our landing pages and booking flow web apps. This blog post will start to take you through how we set ourselves up to do this and the improvements we made.

Measure

Before we even got started on improvements, we needed a way to prove that the work we were about to do would positively impact the user experience by measuring where we were right now.

Starting from scratch today, you’d want to check out Google Web Vitals as a good first set of metrics to track, each of which tries to provide a signal that together represent a good experience on the web.

In our case, web vitals weren’t yet fully baked when we started on this journey, so we used a number of existing metrics to build up a view of our web performance at Trainline. These were:

time to first byte (TTFB)
first contentful paint (FCP)
largest contentful paint (LCP), a web vital, and
first CPU idle (FCI)

Our measurement MVP

Record as much data as possible — metrics and waterfalls, continuously over time;
With as long a data retention as possible;
Correlate this data with website releases

You can, of course, build your own tooling to push this data into your existing application performance monitoring (APM) vendor given that many of these metrics are available either in Performance APIs like Navigation Timing API or PerformanceObserver. However, metrics are only part of the story — of equal importance was how those metrics landed in the waterfall so we could investigate opportunities and diagnose issues. So, we integrated with SpeedCurve which provided most of this out the box, and by adding speedcurve-cli to our web app build pipeline, specifically `speedcurve deploy`, we could do the correlation with releases too.

Synthetics or Real User Monitoring (RUM)? You’re going to need both.

Synthetic monitoring reports web app performance data from traffic generated by a test agent in a consistent ‘lab’ environment, segmented by a number of variables including region, browser and emulated device type. Real User Monitoring captures performance data from a customer’s web browser and their interactions with your web app — it is often referred to as ‘field’ data.

Synthetics will give you additional details you don’t get from RUM such as a waterfall and profiler information that can be used to investigate further, but the tests are run using simulated throttling of network and CPU, and the resulting data does not reflect real user experiences. The results you get from Synthetics runs are comparable with each other if you keep the conditions the same, such as the machine you run it on. Synthetics are run in a real web browser, thus the waterfall you get will be the same as that experienced by real users of that browser using your web app. By being aware of and reducing variability, comparing lab data is often a good way to validate a proof-of-concept performance win.

Before we introduced SpeedCurve, the only Synthetics we were doing were on-demand from WebPageTest.org or Lighthouse running on our machines with the results discarded, so we weren’t able to spot trends or do comparisons. Our only long-term data was RUM data, giving us time series performance data from real users, but not to the level of detail we needed to find possible improvements, to verify an improvement did what we expected, or why a metric regressed after release.

Synthetics is what engineers should be working with when doing performance work as it provides all the details necessary to assess website performance. Furthermore, improvements in Synthetics will generally be reflected in RUM. But RUM data is what should be reported to the business given it is a reflection of real user experiences of the web app.

Do we have enough metrics?

Once we had SpeedCurve set up, the set of metrics exposed a large period of time between those that land early, such as TTFB, FCP and LCP and those that land later such as FCI. We wanted more granularity on what was happening in this period.

Historical Trainline performance from SpeedCurve

So, we introduced some new timing marks using the User Timing API. Some of these were defined by engineering as important timing points in the web app lifecycle, such as the time when the app downloaded all bundles or when the app booted. Others were defined in collaboration with product stakeholders as important timing points during page load, marking when the most important content on the page was first painted in the browser, such as the search form, or the search results.

Historical Trainline performance from SpeedCurve, annotated with new user timing marks

User timing marks have broad browser support and are visible in all performance tooling such as Lighthouse, the Chrome devtools profiler and SpeedCurve. This allowed us to do controlled experiments of a performance opportunity on our machines, record if there was an improvement, and if so, roll it out to have it reflected in Synthetics in SpeedCurve, and lastly, real users.

So did we have enough to start improving?

Almost — now we have a regular snapshot of our web apps performance, how are we benchmarking? We looked at similar experiences on the web and other applications in our web stack that we could learn from and provide the most performant experience possible.

Key to this was having long-term consistency which meant ensuring that benchmarks were run on the same simulated hardware over and over again — something SpeedCurve really helped us with. Our mobile web experience performed in a way we wanted all our customers to enjoy, and as it was an internal application, we could add the same user timing marks to do a direct comparison with our main booking flow. We adopted these metrics as performance targets.

Then, we could start. We began to compare the web apps, like the length and depth of the waterfall, the overall page weight (of JavaScript, HTML and other assets) and the number of requests to start to build a backlog of performance improvements, ultimately so all of our customers could benefit, not just mobile customers.

In summary, we had the luxury of another web app that we could use as a benchmark and to set performance goals against. In the absence of this, you might consider benchmarking against your competition and using that as a goal, or by setting a target improvement such as a percentage for a performance metric.

Check back in the future for the second part of this series where we’ll introduce some of the improvements we made and the impact this had on our performance metrics.

Check out Luca’s blog for 5 mistakes to avoid in optimising your web app performance.

Trainline’s journey to speed up the customer experience: getting set up

Trainline’s journey to speed up the customer experience: getting set up

Measure

Our measurement MVP

Synthetics or Real User Monitoring (RUM)? You’re going to need both.

Do we have enough metrics?

So did we have enough to start improving?

Recommend

这个黄金赛道又添新玩家：4000亿美金巨头

API Security: logic-based threats and how to combat them

Women in Tech: "Don’t hesitate to show what you know" - JAXenter

ReSharper 2022.2 Starts the EAP

RDP图形性能优化调整

Work After College? Few Tips To Cope With Stress

Stefanos Sifandos' Philosophy on Life, Career, and Relationship Success - CEOWOR...

Top 5 Cities To Visit In China

2022-05-22-那不勒斯-斯佩齊亞直播視頻/錄像/回放|黑白直播 - 世界杯直播 - NBA(免费...

2022-11-23-日本-德國|黑白直播 - 世界杯直播 - 黑白网

About Joyk