下列为电商公司用户访问时间数据 | 统计某个用户连续的访问记录,如果时间间隔小于60s,就分为一组 | |
id ts 1001 17523641234 1001 17523641256 1002 17523641278 1001 17523641334 1002 17523641434 1001 17523641534 1001 17523641544 1002 17523641634 1001 17523641638 1001 17523641654 | id ts group 1001 17523641234 1 1001 17523641256 1 1001 17523641334 2 1001 17523641534 3 1001 17523641544 3 1001 17523641638 4 1001 17523641654 4 1002 17523641278 1 1002 17523641434 2 1002 17523641634 3 |
2. 需求:
① 创建表
② 统计每个用户连续的访问记录中,如果时间间隔小于60s,就分为一个组(观察数是标准的时间戳吗?)
3. 数据准备:
编辑文件user_visit.txt
[atguigu@hadoop102 ~]$ vim /opt/module/hive/datas/user_visit.txt
1001 17523641234
1001 17523641256
1002 17523641278
1001 17523641334
1002 17523641434
1001 17523641534
1001 17523641544
1002 17523641634
1001 17523641638
1001 17523641654
select id, ts, sum(flag) over (partition by id order by ts) as id_group
from
(select id, ts, lag_ts, (ts - lag_ts) as diff, if((ts - lag_ts) < 60, 0, 1) as flag from(select id, ts,lag(ts, 1, 0) over (partition by id order by ts) as lag_tsfrom test2) tb1
)tb2