46

ClickHouse 高阶函数 - 简书

 4 years ago
source link: https://www.jianshu.com/p/5e8205edc7d9?
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

ClickHouse 高阶函数

0.6262019.09.25 14:48:47字数 692阅读 6,212

先来一个完整的例子,该示例根据行为日志计算用户访问的top路径
select data, count(1) cn from (
with maxIf( c_t , cat='page_view'and act='页面浏览') as max_time, -- 目标事件时间
arraySort(
e -> e.1,
arrayFilter(x->x.1<=toUInt64OrZero(max_time),groupArray((toUInt64OrZero(c_t), (cat,act) )))
) as sorted_array,
-- 按时间排序后的数据
-- arrayPushFront( sorted_array, sorted_array[1] ) as e_arr,
arrayFilter(
(i, e,z) -> z.1 < toUInt64OrZero(max_time)
and (e > 1800000 or (z.2.1='page_view' and z.2.2='页面
浏览')),
arrayEnumerate(sorted_array), arrayDifference( sorted_array.1 ),sorted_array
) as arr_indx, -- 过滤目标事件、时间差后的数据
arrayReduce('max',arr_indx) +1 as smIndx,
arrayFilter(
(e,i) -> i>=smIndx and e.1<=toUInt64OrZero(max_time) ,
sorted_array, arrayEnumerate(sorted_array)
) as data_
select u_i,
arrayFilter((x,y,i)-> i=1 or i>1 and y<>0 ,data_.2,arrayDifference(arrayEnumerateDense(data_.2)),arrayEnumerate(data_)) as data__,
arraySlice(data__,-7,7 ) as data,
-- arrayStringConcat(data,'->') as path,
hasAll(data, [ ('page_view','页面_浏览') ]) as has_way_point --路径中必须经过的点
from app.scene_tracker where c_p='PC' and length(u_i)>20
group by u_i having length(data)>1
) tab
where has_way_point=1 group by data order by cn desc limit 100

根据经验,大家如处理复杂业务,这些函数会经常用到

neighbor

  • 获取某一列前后相邻的数据,第二个参数控制前后相邻的距离

SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)  

arrayJoin

  • 行变列,对数组进行展开操作

# 还是上面的例子
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)  

arraySort

  • 对数组进行排序,降序的话用这个 arrayReverseSort

# 还是上面的例子 略作修改,可对比示例1和示例3的结果区别
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort([1,2,3,6,34,3,11]) ) as a,'u' as  b)  

arrayFilter

  • 过滤出数组中满足条件的数据

# 我们只获取数组中的偶数部分
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort(arrayFilter(x->x%2=0, [1,2,3,6,34,3,11])) ) as a,'u' as  b)  

arrayEnumerate

  • 返回数组下标

SELECT arrayEnumerate( [1,2,3,6,34,3,11] )

arrayDifference

  • 计算数组中前后两个值的差值部分

SELECT arrayDifference( [1,2,3,6,34,3,11] )

arrayReduce

  • 对数组进行聚合操作,min 、max、avg 等

SELECT arrayReduce('avg', [1,2,3,6,34,3,11] )

arrayEnumerateDense

  • 标记出数组中相同的元素

SELECT arrayEnumerateDense( [1,2,3,6,34,3,11] )

arraySlice

  • 对数组进行切割 ,后面两个参数分别是切割的offset和切割长度
SELECT arraySlice( [1,2,3,6,34,3,11] , -3, 2)
# 返回:34 3

hasAny

  • 判断数组中是否包含某些值,包含其一返回1 ,否则0 ;如果判断全部包含 用hasAll
  • 示例10:
SELECT hasAny( [1,2,3,6,34,3,11] , [3,1])

arrayStringConcat

  • 将数组元素按照给定分隔符进行拼接,返回拼接后的字符串
  • 示例11:
SELECT arrayStringConcat( [1,2,3,6,34,3,11] , '-') 

arrayPushFront

  • 向数组首位置最加value ;同理向数组末尾最加为arrayPushBack
  • 示例12:
SELECT arrayPushFront( [1,2,3,6,34,3,11] , 8)

arrayPopFront

  • 移除数组下标为1的值;同理,移除数组最后一个值用arrayPopBack
  • 示例13:
SELECT arrayPopFront( [1,2,3,6,34,3,11] ) 

arrayWithConstant

  • 生成一个指定长度的数组
  • 示例14:
#生成长度为3 的数组
SELECT arrayWithConstant( 3, 'a')
#范围值为['a','a','a']

arrayUniq

  • 计算数组中有多少个不重复的值;如进行数组去重操作 用arrayDistinct
  • 示例15:
SELECT arrayUniq( [1,2,3,6,34,3,11]) 

runningDifference

  • 计算某一列前后数值的差值
  • 示例16:
select a,runningDifference(a)  from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as  b)

arrayCompact

  • 对数组内数据实现相邻去重
  • 示例17:
SELECT arrayCompact([1, 2, 2, 3, 2, 3, 3])
#返回值为 [1,2,3,2,3]        


开篇示例语句运行结果如下图

image.png

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK