ClickHouse 高阶函数 - 简书
source link: https://www.jianshu.com/p/5e8205edc7d9?
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
ClickHouse 高阶函数
先来一个完整的例子,该示例根据行为日志计算用户访问的top路径
select data, count(1) cn from (
with maxIf( c_t , cat='page_view'and act='页面浏览') as max_time, -- 目标事件时间
arraySort(
e -> e.1,
arrayFilter(x->x.1<=toUInt64OrZero(max_time),groupArray((toUInt64OrZero(c_t), (cat,act) )))
) as sorted_array,
-- 按时间排序后的数据
-- arrayPushFront( sorted_array, sorted_array[1] ) as e_arr,
arrayFilter(
(i, e,z) -> z.1 < toUInt64OrZero(max_time)
and (e > 1800000 or (z.2.1='page_view' and z.2.2='页面浏览')),
arrayEnumerate(sorted_array), arrayDifference( sorted_array.1 ),sorted_array
) as arr_indx, -- 过滤目标事件、时间差后的数据
arrayReduce('max',arr_indx) +1 as smIndx,
arrayFilter(
(e,i) -> i>=smIndx and e.1<=toUInt64OrZero(max_time) ,
sorted_array, arrayEnumerate(sorted_array)
) as data_
select u_i,
arrayFilter((x,y,i)-> i=1 or i>1 and y<>0 ,data_.2,arrayDifference(arrayEnumerateDense(data_.2)),arrayEnumerate(data_)) as data__,
arraySlice(data__,-7,7 ) as data,
-- arrayStringConcat(data,'->') as path,
hasAll(data, [ ('page_view','页面_浏览') ]) as has_way_point --路径中必须经过的点
from app.scene_tracker where c_p='PC' and length(u_i)>20
group by u_i having length(data)>1
) tab
where has_way_point=1 group by data order by cn desc limit 100
根据经验,大家如处理复杂业务,这些函数会经常用到
neighbor
获取某一列前后相邻的数据,第二个参数控制前后相邻的距离
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as b)
arrayJoin
行变列,对数组进行展开操作
# 还是上面的例子
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as b)
arraySort
对数组进行排序,降序的话用这个 arrayReverseSort
# 还是上面的例子 略作修改,可对比示例1和示例3的结果区别
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort([1,2,3,6,34,3,11]) ) as a,'u' as b)
arrayFilter
过滤出数组中满足条件的数据
# 我们只获取数组中的偶数部分
SELECT a, neighbor( a,-1 ) from (SELECT arrayJoin( arraySort(arrayFilter(x->x%2=0, [1,2,3,6,34,3,11])) ) as a,'u' as b)
arrayEnumerate
返回数组下标
SELECT arrayEnumerate( [1,2,3,6,34,3,11] )
arrayDifference
计算数组中前后两个值的差值部分
SELECT arrayDifference( [1,2,3,6,34,3,11] )
arrayReduce
对数组进行聚合操作,min 、max、avg 等
SELECT arrayReduce('avg', [1,2,3,6,34,3,11] )
arrayEnumerateDense
标记出数组中相同的元素
SELECT arrayEnumerateDense( [1,2,3,6,34,3,11] )
arraySlice
- 对数组进行切割 ,后面两个参数分别是切割的offset和切割长度
SELECT arraySlice( [1,2,3,6,34,3,11] , -3, 2)
# 返回:34 3
hasAny
- 判断数组中是否包含某些值,包含其一返回1 ,否则0 ;如果判断全部包含 用hasAll
- 示例10:
SELECT hasAny( [1,2,3,6,34,3,11] , [3,1])
arrayStringConcat
- 将数组元素按照给定分隔符进行拼接,返回拼接后的字符串
- 示例11:
SELECT arrayStringConcat( [1,2,3,6,34,3,11] , '-')
arrayPushFront
- 向数组首位置最加value ;同理向数组末尾最加为arrayPushBack
- 示例12:
SELECT arrayPushFront( [1,2,3,6,34,3,11] , 8)
arrayPopFront
- 移除数组下标为1的值;同理,移除数组最后一个值用arrayPopBack
- 示例13:
SELECT arrayPopFront( [1,2,3,6,34,3,11] )
arrayWithConstant
- 生成一个指定长度的数组
- 示例14:
#生成长度为3 的数组
SELECT arrayWithConstant( 3, 'a')
#范围值为['a','a','a']
arrayUniq
- 计算数组中有多少个不重复的值;如进行数组去重操作 用arrayDistinct
- 示例15:
SELECT arrayUniq( [1,2,3,6,34,3,11])
runningDifference
- 计算某一列前后数值的差值
- 示例16:
select a,runningDifference(a) from (SELECT arrayJoin( [1,2,3,6,34,3,11] ) as a,'u' as b)
arrayCompact
- 对数组内数据实现相邻去重
- 示例17:
SELECT arrayCompact([1, 2, 2, 3, 2, 3, 3])
#返回值为 [1,2,3,2,3]
开篇示例语句运行结果如下图
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK