12

sparkStreaming之transform的细节

 3 years ago
source link: https://blog.csdn.net/weixin_42307036/article/details/111995954
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

sparkStreaming之transform的细节

val socketLineDStream: ReceiverInputDStream[String] = streamingContext.socketTextStream('linux1', 8888)
// TODO Driver中执行一次
// 例如val a = 1 在Driver中只执行一次
// 首先看一下DStream的其他Transformations(转换)操作
socketLineDStream.map({
    case x => {
        // TODO Executor中执行n次(n是Executor数)
    }
}
)

// 重点来了,看一下DStream的transform转换操作
socketLineDStream.transform({
    case rdd => {
        // TODO Driver中执行m次(m是采集周期数)
        rdd.map({
            case x => {
                //TODO Executor中执行n次(n是Executor数)
            }
        })
    }
}
)

注意:

  • transform中的注释处的m就是细节之处,它可以保证此处运行在Driver中的代码可以周期(SparkStreaming的数据采集周期)间变化,即每个数据周期transform走一遍。用处之一是黑名单的更新(比如恶意发帖的用户的更新)
  • 为什么DStream.map里面的代码执行是在Executor?个人理解因为DStream在每个周期(批次)相当于就是一个RDD的封装,所以可以类比RDD.map()里面的代码是运行在Executor端

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK