78

Download YouTube videos with AWS Lambda and store them on S3

 4 years ago
source link: https://www.tuicool.com/articles/Rbymeia
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Recently, I was faced with the challenge to download videos from YouTube and store them on S3.

youtube.jpg

Sounds easy? Remember than Lambda comes with a few limitations:

/tmp

While working on a solution, I encountered multiple problems:

/tmp

Let’s look at how I finally solved the problem with a streaming approach in Node.js. I use the youtube-dl library to get easy access to YouTube videos.

First, we create a PassThrough stream in Node.js. A pass-through stream is a duplex stream where you can write on one side and read on the other side.

const stream = require('stream');
const passtrough = new stream.PassThrough();

Next, we need to write data to the stream. This is done by the youtube-dl library.

const youtubedl = require('youtube-dl');
const dl = youtubedl(event.videoUrl, ['--format=best[ext=mp4]'], {maxBuffer: Infinity});
dl.pipe(passtrough); // write video to the pass-through stream

And finally, we need to upload the stream to S3. We make use of the Multipart Upload feature of S3 which allows us to upload a big file in smaller chunks. This way, we only have to buffer the small junk (64 MB in this case) in memory and not the whole file.

const AWS = require('aws-sdk');
const upload = new AWS.S3.ManagedUpload({
  params: {
    Bucket: process.env.BUCKET_NAME,
    Key: 'video.mp4',
    Body: passtrough
  },
  partSize: 1024 * 1024 * 64 // 64 MB in bytes
});
upload.send((err) => {
  if (err) {
    console.log('error', err);
  } else {
    console.log('done');
  }
});

That’s it. Now you can download YouTube videos of any size with Lambda and upload them to S3. I recommend running the code in a “big” Lambda function with 3008 MB of memory for better network performance.

You can find the full source code on GitHub including a SAM template to provision the AWS resources. Have fun!

This is a shorter article. Do you prefer longer or shorter reads? Let me know! [email protected] , LinkedIn , or @hellomichibye .

Published on 17 May 2019


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK