2

Using Puppeteer in Google Cloud Functions

 3 years ago
source link: https://rominirani.com/using-puppeteer-in-google-cloud-functions-809a14856e14
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Using Puppeteer in Google Cloud Functions

This is part of a Google Cloud Functions Tutorial Series. Check out the series for all the articles

Ever since I heard the term headless Chrome, I have been curious about what that exactly means and the kind of applications that it can help write. Recently I checked out an excellent talk by Eric Bidelman from Google IO 2018 titled “The power of Headless Chrome and browser automation”. I recommend you watch this video for the first half atleast to understand what headless Chrome is and how it works.

In summary, headless Chrome is :

  • Allows you to run Chrome in a headless environment (without the visible UI shell)
  • Very useful for automated testing
  • Allows you to do several things with web page i.e. query specific elements, take screenshot, create a PDF, etc.

For more information, you can check out the following article:

Support for headless Chrome in Google Cloud Functions environment

When Google Cloud Functions was first released, the only runtime that it supported was Node.js version 6 and the OS was missing several packages that made it difficult to run Chrome in this fashion.

A couple of months back came the announcement that headless Chrome support was now available in App Engine standard and Cloud Functions. This was made possible by the release of Node.js 8 runtime on App Engine standard and which was the same runtime used for Google Cloud Functions too. Check out the official blog post that announced it:

Enter Puppeteer

To make things dead simple for developers, we have a npm package called Puppeteer that makes working with headless Chrome a breeze. The default installation even comes bundled with a version of Chromium, so that it is self-contained and has everything to get you started.

To install Puppeteer, simply use the following command:

npm install --save puppeteer

If you would like to learn more about Puppeteer, check out https://pptr.dev/. There is even a Puppeteer playground at https://try-puppeteer.appspot.com/

Our Google Cloud Function : Love is Comic

While there are multiple ways in which one could have done this, I wanted to try this out with Puppeteer and see how it goes.

Our Cloud Function will return just the Comic Strip from the page at https://loveiscomix.com/random. In short it will return HTML content that will contain just the comic strip, an example of which is shown below:

Image for post
Image for post

Let us check out the index.js file that contains our Cloud Function:

Let us understand the key pieces of the code and before we jump to that, what I am trying to do is the following:

  1. Launch Chrome in headless mode.
  2. Visit the Random Comic page at https://loveiscomix.com/random.
  3. Extract out the specific comic strip image via the DOM query selector.
  4. Return back the Image URL that I got from (3) above and send back a simple HTML page that contains the img HTML element with the src attribute set to the comic strip image value.

Steps (1) + (2) + (3) happen via the following code snippet via the Puppeteer package:

The package.json file is standard stuff and it contains the dependency for our puppeteer package.

{
"name": "loveiscomic",
"version": "1.0.0",
"description": "LoveIsComic Puppeteer Script",
"dependencies": {
"puppeteer": "^1.9.0"
}
}

Deploying the Cloud Function

Ensure that both index.js and package.json file that we have created above are present in the same directory.

We can use the gcloud functions command to deploy the function as shown below. Note that we will be having a HTTP Trigger for our cloud function, the runtime will be Node version 8 and we will be giving it ample memory (more on that later in the post):

$ gcloud functions deploy --region=us-central1 --runtime=nodejs8 \ 
--memory=1024MB --trigger-http sendComic

Once it is deployed, you can check that the sendComic function is available via the gcloud functions list command.

Executing our Cloud Function

You can get the details on the functions via the describe command as shown below:

$ gcloud functions describe sendComic

The output from the above command will contain the httpsTrigger property, an example of which is shown below:

httpsTrigger:
url: https://<region>-<projectid>.cloudfunctions.net/sendComic

In your case, the <region> and projectid above will contain the appropriate values. Simply invoke the above URL in your browser and it should invoke the HTTP Trigger based Google Cloud Function sendComic and give you a random Love Is Comic strip. This is what I got:

Image for post
Image for post

Few points to note

I hit a few blocks while trying to run this Google Cloud Function. I erred in not paying too much attention to the official blog post and learnt a couple of things about deploying Google Cloud Functions that using Puppeteer.

Chrome in Sandbox mode

When I first wrote the function, I launched the headless browser as shown below:

const browser = await puppeteer.launch();

This resulted in an exception as shown below:

Error: Failed to launch chrome! [1020/082529.501520:ERROR:zygote_host_impl_linux.cc(89)] Running as root without — no-sandbox is not supported.

This was corrected by providing the following flag--no-sandbox while launching Chrome:

const browser = await puppeteer.launch({args: [‘--no-sandbox’]});

Allocate more memory to your function

I deployed the function with the default memory allocated to it i.e. 256MB and that is definitely not enough. I got the following error during function execution:

Error: memory limit exceeded. Function invocation was interrupted.

I had to deploy the function with --memory=1024MB option while using the gcloud functions deploy command.

Do keep in mind that allocating more memory to your function is definitely going to reflect in your costs.

Hope you enjoyed this article. Do share what you plan to run with Puppeteer.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK