5

A Fight to Deliver Apps to the Globe Faster

 3 years ago
source link: https://sourcediving.com/a-fight-to-deliver-apps-to-the-globe-faster-97e5760956ce
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

A Fight to Deliver Apps to the Globe Faster

Image for post
Image for post
https://pixabay.com/photos/courier-night-panning-warsaw-1214227/

At Cookpad, we build a global community to make everyday cooking fun. We deliver an iOS app to the worldwide market that supports 26 languages and about 74 regions as of today. It is always challenging to develop and ship apps on that scale. Release management is vital to be able to deliver new features or bug fixes to our customers quickly and continuously. In order to do this, we automate our release flow with using fastlane, Jenkins, and Slack. Thanks to this automation, we have been able to do weekly release cycles for a while.

However, we have recently faced an issue where submission by fastlane's deliver was getting unacceptably slow during a couple of releases, which made it take about 4 hours and sometimes ended up failing altogether. After spending some days to investigate it, I was able to improve the performance of deliver to complete submission within a reasonable time. fastlane/fastlane#16972

Let’s have a look at what happened and how I overcame the issue.

Big Update Comes with a Pain

Just a week before WWDC 2020, on 16th June, Apple renewed its App Store Connect portal, and it appeared to be causing errors in fastlane's deliver. This meant that we couldn't use fastlane to submit apps anymore for the time being. (Technically we could upload the binary .ipa file but couldn't update metadata and screenshots.) fastlane/fastlane#16621

This issue impacted us a lot and we had to figure out what we can or can’t do with the existing fastlane lanes in our Fastfile and then follow necessary steps to manually release in the first week. At that moment, @joshdholtz the lead maintainer of fastlane was the hero that eagerly migrated old APIs to the latest APIs. As we waited for the migration to be done we tested RC versions locally for a few weeks.

On 2nd July, fastlane v2.150.0 was finally released followed by some patch version bumps. Although we saw errors while uploading screenshots, that turned out to be our fault as some errors were not caught by old APIs; e.g. filename extensions didn't match the expected file format. All seemed to work when we first tried the new version.

Weekends didn’t come free

v2.150.0 resolved most of our submission issues, however some additional work was still required to support the new APIs for asset uploading. This was eventually resolved in fastlane/fastlane#16842.

In our weekly release cycle, we submit the app every Friday afternoon, and as the release manger I was trying to submit the app with CI and waiting for CI jobs to complete for hours on Friday afternoon. When it failed, I tried it again. Each trial took up around 3–4 hours and my working hours were over, but I kept repeating that from my iPhone. This continued until midnight on Saturday. I wasn’t that exhausted doing it but felt like my work extended forever.

I decided to fight against this to get back my Fridays and weekends.

Test it, Measure it

For our case it was apparent that a specific part of the app submission was incredibly slow and problematic. We always run fastlane's command with --verbose option on Jenkins so that we could check those logs quickly.

We saw two issues:

  1. Uploading screenshots was too slow
  2. The results when making bad entries were shown as errors in the screenshots section in the App Store Console

Although they were different issues, both were around screenshots uploading. To test and measure screenshot uploads, firstly, I set up a dummy app and a lane to run deliver like this. In general, you need to be able to measure how fast or slow it is if you want to improve performance, otherwise you can't even tell if the performance improved or not by your change.

That didn’t finish within a reasonable time initially as we had about 380 images for production.

[03:36:31]: fastlane.tools just saved you 132 minutes! 🎉

So I left one region’s screenshots and removed the rest locally for testing. That made it easy to do trial and error until it finally finished in 10 minutes.

Ruby working with IO

I had a hunch that parallelisation would help to improve the performance, and I had already made similar performance improvements for other parts of our workflow; i.e. importing and exporting translations from and to an external service. The more languages or regions we support, the more data we need to handle, and if it comes over the Internet, IO — HTTP requests would be likely to be the bottlenecks.

The Ruby language that fastlane uses can make multiple Thread objects easily. In Ruby, you can't let multiple threads run at a time due to GIL (Global Interpreter Lock), which secures Ruby's thread safety, you can still multiplex IO with them.

Let’s say downloading an image takes up to 10 seconds and we need to download 10 images sequentially. It is done by 10 secs * 10 images = 100 seconds. If Ruby multiplexes the network request to download images, that is done in nearly 10 secs with Ruby’s Thread in theory. It's ten times faster than without using threads.

This, of course, applies to uploads as well. So I used Thread to parallelise deliver. This change seemed to make it run about two times faster than before with the mini data set.

Find Pattern, Solve Problem

After reading through fastlane deliver and assessing logs, I noticed deletions of screenshots (driven by overwrite_screenshots flag) was already running on multi-threads but it was not as fast as I imagined. So I added a micro-benchmark additionally around the uploading and deleting code to investigate its elapsed time.

This change appeared to be quite helpful for me to identify how the real problem was occurring. Most of the deletion requests completed quickly, but some of them could take longer than others. This became much more obvious when observing upload requests.

Uploading ‘./fastlane/screenshots/ar-SA/5.5_1.jpg’...
Uploaded './fastlane/screenshots/ar-SA/5.5_1.jpg'... (2.637683 secs)
...
Uploading ‘./fastlane/screenshots/ar-SA/iPad Pro (12.9-inch) (3rd generation)_3.jpg’...
Uploaded './fastlane/screenshots/ar-SA/iPad Pro (12.9-inch) (3rd generation)_3.jpg’... (125.733078 secs)

This slows down the most straightforward approach to use multi-threads. For example, deliver's upload operation was like this semi-pseudo code.

Look at this picture below, and imagine you have three threads and three images to upload on each region; ar-SA and en-US. The above pseudo-code reluctantly has to wait for the longest response time to move on to the next locale. Let’s sort it out.

The simplest approach to upload images in parallel is to parallelise uploading for each region
The simplest approach to upload images in parallel is to parallelise uploading for each region
The simplest approach to upload images in parallel is to parallelise uploading for each region

In a nutshell, this can be solved with the Queue-Worker pattern, which is commonly known and used in a variety of places where scalability matters but often its name varies; Thread Pool, Job Queue, or Work Queue. iOS developers will recognise it as DispatchQueue or OperationQueue.

We can expect more speed in uploading generally as it won’t be blocked by slow responses in each iteration. In the following picture the graph shows when the Queue-Worker pattern is used with three threads. Even if each response time is the same, the latter can use a free thread efficiently.

Image for post
Image for post
How much “Queue-Worker” pattern can minimise the overall execution time in the case

From this point of view, Ruby is powerful. It has a class Thread::Queue (or just Queue) cooperating with Thread.

The Queue class implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The Queue class implements all the required locking semantics. The class implements FIFO type of queue. In a FIFO queue, the first tasks added are the first retrieved.

So thanks to the power of Queue, I was able to implement "Queue-Worker" pattern with a piece of code in Ruby.

This QueueWorker can be used just like this.

So I applied this in deliver, and as a result it completed about six times faster locally once the App Store Connect API was responding better. (Remember it used to take over 2 hours initially.)

  • With fastlane 2.154.0 19m25.106s
  • With my working branch 3m20.148s

Test environment

We had 385 images to be uploaded (but four skipped due to exceeding the limit of 10 in each screenshot set 🙈)

% du -sh fastlane/screenshots
90M fastlane/screenshots
% ls fastlane/screenshots/**/*.{png,jpg} | wc -l
385

I used these option to run deliver to upload screenshots.

Not exactly saving your time if it’s slow

[03:36:31]: fastlane.tools just saved you 132 minutes! 🎉

This message that fastlane outputs at the end of the command always reminds me that I would be exhausted every week if I were to submit apps manually. I love how fastlane saves our precious time in day-to-day work. However, if there is some bottleneck in it, it may involve your time in the end when you need to check the result of your fastlane action. In other words, automated workflows should run fast enough that you’re still able to retry within your working hours. Failures are a reality, nothing is perfect!

Fight to save your time and enjoy your weekend.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK