3

groupBy on multiple values

 2 years ago
source link: https://www.codesd.com/item/groupby-on-multiple-values.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

groupBy on multiple values

advertisements

I have a list of calls, smsIn, smsOut in a CSV file, and I want to count the number of smsIn/smsOut for each Phone number.

CallType indicates the type (call, smsIn, smsOut)

An example of the data is (phoneNumber, callType)

7035076600, 30
5081236732, 31
5024551234, 30
7035076600, 31
7035076600, 30

Ultimately, I want something like this: phoneNum, numSMSIn, numSMSOUt I have implemented something like this:

val smsOutByPhoneNum = partitionedCalls.
                       filter{ arry => arry(2) == 30}.
                       groupBy { x => x(1) }.
                       map(f=> (f._1,f._2.iterator.length)).
                       collect()

The above gives the number of SMS out for each phone number. Similarly

val smsInByPhoneNum = partitionedCalls.
                      filter{ arry => arry(2) == 31}.
                      groupBy { x => x(1) }.
                      map(f => (f._1, f._2.iterator.length)).
                      collect()

The above gives the number of SMS in for each phone number.

Is there a way where I can get both done in one iteration instead of two.


Great answer @zero323

val partitionedCalls = sc.parallelize(Array(("7035076600", "30"),
("5081236732", "31"), ("5024551234", "30"),("7035076600", "31"),
("7035076600", "30")))

# count the pairs <(phoneNumber, code), count>
val keyPairCounts = partitionedCalls.map((_,1))
# using reduceByKey
val aggregateCounts = keyPairCounts.reduceByKey(_ + _).map{ case((phNum,
inOrOut), cnt) => (phNum, (inOrOut, cnt)) }
# using groupBy to aggregate and merge similar keys
val result = aggregateCounts.groupByKey.map(x => (x._1,
x._2.toMap.values.toArray))

# collect the result
result.map(x => (x._1, x._2.lift(0).getOrElse(0),
x._2.lift(1).getOrElse(0))).collect().map(println)

Reference: A good explanation on difference between groupBy and reduceBy:prefer_reducebykey_over_groupbykey


Recommend

  • 33
    • www.tuicool.com 4 years ago
    • Cache

    Pandas Groupby Tutorial

    Hope if you are reading this post then you know what is groupby in SQL and how it is being used to aggregate the data of the rows with the same value in one or more column. I was recently working on the Pandas Groupby and...

  • 27

    FIFA 19 Complete Player Dataset Courtesy of FIFA International Soccer Making Groupings and Presenting/Using that Da...

  • 1
    • justinmeiners.github.io 3 years ago
    • Cache

    Understanding LINQ GroupBy

    Understanding LINQ GroupByUnderstanding LINQ GroupBy 09/26/20 C# programmers are typically familiar with Select, Where, and Aggregate, the LINQ equivalents of the core...

  • 4

    Pandas 系列文章: 【NumPy 专栏】【P...

  • 4
    • www.guofei.site 2 years ago
    • Cache

    【pandas】groupby

    【pandas】groupby 2017年10月18日 Author: Guofei 文章归类: 1-2-Pandas与numpy ,文章编号: 106 版权声明:本文作者是郭飞。转载随意,但需要...

  • 1
    • www.flydean.com 2 years ago
    • Cache

    Pandas高级教程之:GroupBy用法

    Pandas高级教程之:GroupBy用法 Pandas高级教程之:GroupBy用法 pandas中的DF数据类型可以像数据库表格一样进行groupby操作。通常来说groupby操作可以分为三部分:分割数据,应用变换和和合并数据。 本文将会详细讲解Pandas中的gr...

  • 3

    你是否想过在js中如何对数组正确地进行分组?让我猜猜,你是否对结果不太满意?数组分组是一种很常见的操作,并有很多种实现方法,但是直到现在也没有原生方法并且所有实现的方法都有些...冗长难懂?我们将会探讨如何进行分组并简化这一切。

  • 100

    array-group-by-ponyfill Array.prototype.groupBy ponyfill. A proposal to make grouping of items in an a...

  • 0

    Array grouping is a fairly common operation in any project. Until recently, we had to resort to either writing our own implementation or using third-party libraries when wanting to GroupBy on an array of objects in JavaScript.That...

  • 2

    Pandas Tutorial Part #16 – DataFrame GroupBy This tutorial will discuss the Group By functionality of DataFrames in Pandas. Table Of Contents The Group By mechanism in the Panda...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK