spark streaming写入kafka性能优化

4 years ago

source link: https://www.tuicool.com/articles/V3Y32ii
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

本文原文（点击下面阅读原文即可进入） https://blog.csdn.net/xianpanjia4616/article/details/81432869

在实际的项目中，有时候我们需要把一些数据实时的写回到kafka中去，一般的话我们是这样写的，如下：

但是这种写法有很严重的缺点，对于每个rdd的每一个partition的数据，每一次都需要创建一个KafkaProducer，显然这种做法是不太合理的，而且会带来性能问题，导致写的速度特别慢，那怎么解决这个问题呢？

1、首先，我们需要将KafkaProducer利用lazy val的方式进行包装如下：

2、之后我们利用广播变量的形式，将KafkaProducer广播到每一个executor，如下：

3、然后我们就可以在每一个executor上面将数据写入到kafka中了

这样的话，就不需要每次都去创建了。先写到这儿吧。经过测试优化过的写法性能是之前的几十倍。如果有写的不对的地方，欢迎大家指正。

一个进阶的大数据技术交流学习公众号，死磕大数据与分布式系统，分享NoSQL数据库、存储计算引擎、消息中间件等。长按二维码关注：

2QVbIf3.png!web

Recommend

1、首先，我们需要将KafkaProducer利用lazy val的方式进行包装如下：

2、之后我们利用广播变量的形式，将KafkaProducer广播到每一个executor，如下：

3、然后我们就可以在每一个executor上面将数据写入到kafka中了

Recommend

Log to Elasticsearch using curl – Hendrik Wallbaum – Medium

Log to Elasticsearch using curl : commandline

GitHub - andyhall/noa: Experimental voxel game engine.

梅澤美波「週刊ヤングジャンプ」21＆22

出来混迟早要还，FBI 通缉5名在逃黑客

Create Interactive .NET Documentation with Try .NET

Book Memo: “Markov Chains”

GitHub - google-research/lottery-ticket-hypothesis: A reimplementation of "...

【运动会开幕式是00后干翻世界的样子】如果你对校运会的印象还停留在：出场方阵只是普...

【哀悼！建筑大师贝聿铭去世】16日，享誉世界的华裔建筑大师贝聿铭去世，享年102岁。...

About Joyk