Mobile Site Speed — Measurement Best Practices
source link: https://medium.baqend.com/mobile-site-speed-measurement-best-practices-ff4a3f91b003
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Mobile Site Speed — Measurement Best Practices
Part 4 of our ongoing study on “Mobile Site Speed & The Impact on Web Performance”
Having launched speedhub.org as the go-to resource on Web performance recently, we thought it was time for another rendition of our long-running blog post series “Mobile Site Speed & the Impact on E-Commerce”. In the first installments, we summarized existing research on how page load times influence user satisfaction and business KPIs. We concluded that accelerating a website always has a positive impact on business, but that this effect is hard to capture in concrete numbers. Before we tackle the actual best practices for acceleration in the next episodes, today we will take a look at the different ways to measure page load times on the Internet.
When Does a Website Feel Fast?
Users typically perceive a website as rather slow or fast, depending on when the first relevant content is displayed. Another indicator is how long it takes until the user can enter data, click on a navigation bar, or interact with the website in some other way. However, capturing these subjective criteria in objective fashion is difficult. To clarify the actual difficulties and potential solutions, this post covers the pros and cons of different options for measuring Web performance.
Today’s Agenda
We will first briefly cover traditional log analyses that are mostly concerned with technical performance in the backend. Since this kind of analysis only gives you indirect information on performance on the user side, we do not go into much detail here and quickly get to synthetic performance tests and the way they approximate user-perceived performance through metrics like the Speed Index and the First Meaningful Paint. Next, we discuss real-user monitoring techniques that take performance measurements from within the client devices of actual users. We discuss what most vendors provide out-of-the-box, why building your own RUM stack might sometimes be necessary, and where the main difficulties lie. We then turn to user surveys that directly ask the users for their opinions, summarizing the benefits and also limitations of this approach. We close our article with a synoptic discussion and tabular overview over all approaches to expound their individual uses and possible synergies.
As an aside, we do not cover App Performance Monitoring (APM) in this article as we consider them amalgamations of the different approaches we discussed.
Log Analysis & Technical Performance
Log analysis generates insights from the data that is already available from the application server, proxy, or content delivery network (CDN). One of the critical metrics that are typically reported is the time to first byte (TTFB). The TTFB describes how long the requesting party has to wait from sending the first request packet until receiving the first response data packet. Server logs typically only reflect the time it took to send the first response, not until the client actually started receiving it; the client TTFB can typically only be approximated on the basis of server or CDN logs. Minimizing the TTFB is mandatory for optimizing page speed, because the browser needs to receive data before it can be displayed. However, a low TTFB does not automatically imply that a website feels fast, since the client’s computing power and many other factors also contribute to how fast the website is rendered.
Synthetic Testing & User-Perceived Performance
Synthetic testing addresses some of these issues as it facilitates detailed measurements for individual pages and even user journeys: Tools like WebPagetest or Google Lighthouse are server applications that load websites just like clients do, but they record detailed logs and even a video of every page load. Running scripted tests periodically can provide useful information on deployment health, for example because increased load times may indicate that the Web server is under pressure or because a before-after comparison might uncover a problem with the most recent deployment. It is even possible to monitor and compare performance of competing websites.
As one particularly useful feature, synthetic testing tools provide meticulous logs of what happened during the page load. Most notably, the waterfall diagram contains timing information on when the individual resources were requested, from which domain and over what kind of connection each of them was served, and how long transmission took. Modern tooling also tells you when the browser was actually doing useful work (e.g. rendering) and when it was idle and waiting for loads to finish. Studying these data points can thus be the first step towards finding and resolving performance bottlenecks.
Figure 2: The Speed Index and the First Meaningful Paint are determined through video analysis and therefore capture accurately how long the user actually had to wait for content to be displayed.
The video that is created during the page load provides a second critical indicator for performance, since it captures how the user perceives the website during the load. This can be important to uncover issues that are impossible to spot through timer analysis alone, for example flickering page elements. More importantly, though, video analysis can be used to compute user-centric performance metrics like the Speed Index or the First Meaningful Paint. As illustrated in Figure 2, both these metrics reflect when content actually becomes visible in slightly different ways. The Speed Index revolves around visual completeness and represents the average time it takes for website elements to become visible. It works well for websites with a static layout, but is unreliable for websites with moving elements like carousels or videos. Computing the Speed Index for this kind of website requires a custom timer or event to demarcate the point in time at which visual completion is reached. The First Meaningful Paint is another user-centric metric and represents the point in time at which the largest visual change takes place; the underlying assumption here obviously is that the biggest visual change is relevant to the user, for example because a hero image or a navigation bar appear.
Since video analysis is not feasible for performance tracking in the field, Google introduced the Largest Contentful Paint (LCP) [5] as an approximation of the First Meaningful Paint that can be captured in the browser. It is part of the Web Vitals [3], a set of browser-based metrics for capturing user-perceived performance. Google heralds the Web Vitals as the gold standard for website performance across a number of services (e.g. PageSpeed Insights, Search Console, TestMySite) and publishes them in the Chrome User Experience Report (CrUX) [7] database. The core metrics will further be used for ranking search results starting in June [6] and even non-AMP pages will be admitted to the Top Stories [4] feature in Search on mobile — given they exhibit top-of-the-line performance according to the Core Web Vitals. Website performance according to the Web Vitals will therefore be critically important for SEO in the upcoming years.
Even though extremely valuable for alerting and health monitoring, synthetic tests are barely suitable for approximating page load times observed by actual users, even when results are averaged over several runs to improve measurement accuracy. First, synthetic test results are often subject to substantial fluctuations, because typical offerings schedule tests dynamically over a set of servers and provide little control over the assigned server or even the amount of resources per execution. But as another inherent problem to the approach per se, synthetic tests are executed on servers that do not correspond well to the kind of hardware used by typical website visitors (namely desktop or mobile devices). Covering all relevant variations quickly becomes complex and expensive due to the sheer number of different browsers, connection profiles, operating systems, screen sizes, etc. that occur in the wild. Synthetic tests can therefore paint an unrealistic picture in terms of client performance as well as the way that ads, tracking, or other third-party libraries behave.
In summary, synthetic tests are extremely useful for performance tuning as they provide detailed logging and user-perceived performance measurements through video analysis. However, they are also subject to an inherent issue that is already given away in the name: They are synthetic and therefore not representative for actual user data. In other words, it is impossible to directly connect business KPIs with Web performance by just using synthetic testing, because synthetic test clients do not participate in the actual business (i.e. in conversions). To determine how performance issues affect bounce rate, conversion rate, and other business success indicators, you therefore have to resort to other means of data collection.
Real-User Monitoring & User Engagement
The idea behind real-user monitoring (RUM) is to collect information on the client side and send it to the backend for further analysis. Typically collected information does not only include various timers to capture network and rendering performance, but also info on what the users clicked and how they engaged with the website. To also capture details about the user’s device and browser or on the referring website over which the user arrived, the user agent string and other data artifacts are collected along with the values obtained from the Navigation and Performance Timing APIs.
Figure 1: The Google CrUX summarizes performance experienced by actual Chrome users.
The Chrome User Experience Report (CrUX) is a more technical approach to understanding how users experience a website, because it is based on numerical measurements taken by the browser: CrUX relies on timers that correlate with when content is displayed (e.g. First or Largest Contentful Paint) or with how fast a website reacts (e.g. First Input Delay). As illustrated in Figure 1, CrUX is based on opt-in performance tracking data collected by Chrome users. It is therefore available without a prior setup, similar to CDN or server logs which are also often available with default settings. Seeing that the Google search rank of a website is partly determined on its basis, the Google CrUX report should further be considered SEO-critical [1]. But even though CrUX data provides insights on perceived website performance, it does not provide a comprehensive overview, because measurements are only taken for Chrome users and reports therefore do not reflect performance experienced by users of Firefox, Safari, and other non-Google browsers. It is consequently not possible to view performance by different browser vendors at all. Since the standard CrUX report is scoped to origin level, it also cannot be used to analyze performance for specific website sections and timeframes, or just for individual sub-pages such as product pages. This level of detail is only provided by PageSpeed Insights and the search console, both of which provide page-level granularity for the performance summaries.
There are a number of popular Web analytics tools, including Google Analytics, Webtrends Analytics, and Adobe Analytics. However, their performance dashboards typically only show sampled data or hugely oversimplifying aggregates such as average load timers: In real-world settings, this kind of performance reporting is often useless without heavy customization as default settings typically do not handle outliers well and also do not capture the long tail of the performance distribution across the user base. Real-user monitoring (RUM) solutions like mPulse, Speed Curve, New Relic, Data Dog, or Dynatrace contrastingly provide unsampled access to the raw data of every individual page impression. But while they may give you all the flexibility you need in capturing actual user performance, these tools are also very complex to operate and even more complex to build on your own: Using RUM tools is therefore associated with significant costs for both the initial setup/development and maintenance.
By connecting measurements of the same users over time, the analysis scope can be widened from single page impressions (PIs) to entire user sessions. Since detailed information on devices and browsers is available, it is possible to discover browser-device combinations for which technical performance is below average or user groups that convert significantly more often than the rest. It is even possible to explore where and when exactly customers drop out of the checkout procedure. This kind of information often gives away performance bottlenecks that are specific to special user groups or certain parts of the website (e.g. the checkout). Monitoring actual user data is therefore critical for not only understanding, but also for optimizing business performance.
Figure 3: Real-user monitoring requires a sophisticated analysis stack for taking measurements in the client, storing and processing them in the backend, and visualizing key insights in dashboards.
But effective real-user monitoring requires solving a number of challenges as illustrated in Figure 3. First, data needs to be collected in the browser in such a way that the rendering process is not obstructed. At the same time, the tracking data needs to be sent without consuming network resources while the page is still being loaded. Once the data has been written to storage in the backend, it needs to be post-processed and aggregated, so that further analyses can be executed efficiently over days or weeks worth of data. Finally, everything has to be wired up with a dashboarding solution to make insights easily accessible.
And this only covers the technical aspect. Data regulations such as the GDPR or the CCPA add another practical challenge as they make a user opt-in obligatory: Before data can be collected, the website provider might be legally required to ask for the customer’s consent.
User Surveys
User surveys can be seen as the direct approach to understanding whether or not users are satisfied with website performance: Just ask them for their opinion. There are numerous ways to get feedback, for example through online surveys or by offering some sort of price or chance to win a competition in exchange for the users opinions. Other options include actual interviews or monitoring users in a lab setting to find out how they react while surfing on the website.
Irrespective of the specific flavor, though, there are several problems with user surveys in general. First, they only cover a small sample of the user base and therefore may be subject to a certain selection bias: The group of users taking part in the survey simply might not represent the population of all users well. An arguably even bigger problem is that user perception can be highly inaccurate [2], especially in the context of web performance (also see Part 2 of this blog series). For example, the user opinion on a website’s design may be dominated by whether the page load was fast or not: Getting reliable info from user surveys is therefore no trivial task. Finally, some forms of surveys (e.g. lab experiments) can be relatively expensive to conduct in comparison to some of the fully automated alternatives for collecting information.
Summing up the Alternatives
Table 1 below sums up the different options for measuring website performance.
Table 1: An overview over the different options for monitoring website performance.
Analyzing logs from application servers, CDNs, or proxies is a reasonable first step to discover potential bottlenecks, but it only covers the server side and does not provide any information on client-side processing in general or rendering performance in particular. This gap is filled with synthetic testing which delivers detailed information on user-perceived performance through waterfall diagrams and even video analysis. But while the likes of WebPagetest and Lighthouse do facilitate performance tuning, they still do not cover how website performance and user behavior are connected. To find out how much exactly bad performance hurts revenue, there is no way around tracking actual users. Real-user monitoring is critical to understanding the business impact of page speed, because it connects technical performance (e.g. load timers) with business metrics (e.g. conversions or bounces). Even though associated with a high total cost of ownership (TCO), using real-user monitoring is a key requirement for operating an online shop at peak performance. Finally, user surveys are great for collecting qualitative feedback on the user experience, but are not suitable for gathering quantitative measurements. Asking users for their opinion on Web performance can therefore lead to highly misleading results, unless the actual performance experienced by the test subject is also measured and taken into account. The other way around, tracking data alone does not tell you whether users are actually happy with your website. The arguably best combination of information gathering approaches is therefore likely the combination of real-user monitoring for technical and business performance with user surveys to get direct feedback on user satisfaction.
Up Next in This Series
This blog post highlighted the different approaches for measuring both technical and business performance, effectively identifying bottlenecks in your online presence. In the next part of this series, we will go one step further and talk about concrete steps to resolve bottlenecks and thereby improve the UX for the customers.
Learn More
To get the next post in this series, register for our speedstudy.info newsletter. (No spamming, promise!) Until then, feel free to read the other posts in our series or check out our code.talks 2019 presentation video on our ongoing study on “Mobile Site Speed & the Impact on E-Commerce”.
Stay fast!
References
[1] Addy Osmani, Ilya Grigorik. Speed is now a landing page factor for Google Search and Ads, Google Developers Blog (Web Updates), 2018
[2] Stoyan Stefanov. Psychology of Performance, Velocity, 2010.
[3] Core Web Vitals, Google Developers, 2020
[4] Sowmya Subramanian. Evaluating page experience for a better web, Google Webmasters Central Blog, 2020
[5] Optimize Largest Contentful Paint, Google Developers, 2020
[6] Jeffrey Jose: More Time, Tools, and Details on the Page Experience, Google Developers, 2021
[7] Chrome User Experience Report, Google Developers, 2021
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK