Saving Spark DataFrames on Amazon S3 got Easier !!!

Reading Time: < 1 minute

In our previous blog post, Congregating Spark Files on S3, we explained that how we can Upload Files(saved in a Spark Cluster) on Amazon S3. Well, I agree that the method explained in that post was a little bit complex and hard to apply. Also, it adds a lot of boilerplate in our code.

So, we started working on simplifying it & finding an easier way to provide a wrapper around Spark DataFrames, which would help us in saving them on S3. And the solution we found to this problem, was a Spark package: spark-s3. It made saving Spark DataFrames on S3 look like a piece of cake, which we can see from the code below:

xxxxxxxxxx

dataFrame.write

.format("com.knoldus.spark.s3")

.option("accessKey","s3_access_key")

.option("secretKey","s3_secret_key")

.option("bucket","bucket_name")

.option("fileType","json")

.save("sample.json")

The code itself explains that now we don’t have to put any extra effort in saving Spark DataFrames on Amazon S3. All, we need to do is include spark-s3 in our project dependencies and we are done.

Right now spark-s3 supports only Scala & Java APIs, but we are working on providing support for Python and R too. So, stay tuned !!!

To know more about it, please read its documentation on GitHub.

Saving Spark DataFrames on Amazon S3 got Easier !!!

Recommend

Unlock US Netflix and Amazon content anywhere with this new VPN | VPN Advisor

Apple Could Launch a Pink Version of the iPhone 13

发光吧特区种子！深圳农商行全新LOGO及品牌系统发布

100多年来，阿富汗换了多少次国旗？

Logging Spark Application on standalone cluster

SASS is often preferred as critically important stylesheet for styling webpage

How To Handle CSRF Token in Jmeter

Facelifted Windows 11 Photos App Starts Rolling Out

A sample ML Pipeline for Clustering in Spark

绝妙创意！巴黎迪士尼乐园公布 30 周年庆典LOGO

About Joyk