ARPU:人均付费=总收入/总人数
ARPPU:付费用户人均付费=总收入/付费人数
业务要求:
1.各地市用户数、总费用(ARPU之和)是多少?
2.表一中各地市ARPU(0,30),[30,50),[50-80),[80以上)用户数分别是多少?
3.表二中用户有重复的记录,找出重复的用户
1.各地市用户数、总费用(ARPU之和)是多少?
SELECT 城市,
COUNT(DISTINCT `用户ID`) as 用户数,
sum(ARPU) as 总费用 from `arpu值`
GROUP BY `城市`
2.表一中各地市ARPU(0,30),[30,50),[50-80),[80以上)用户数分别是多少?
SELECT 城市,
SUM(CASE WHEN ARPU<30 AND ARPU>0 THEN 1 ELSE 0 END) AS '(0,30)',
SUM(CASE WHEN ARPU>=30 AND ARPU<50 THEN 1 ELSE 0 END) AS '[30,50)',
SUM(CASE WHEN ARPU>=50 AND ARPU<80 THEN 1 ELSE 0 END) AS '[50,80)',
SUM(CASE WHEN ARPU>=80 THEN 1 ELSE 0 END) AS '80+'
from `arpu值`
GROUP BY `城市`
3.表二中用户有重复的记录,找出重复的用户
#方法一
SELECT t.`用户ID` FROM
(SELECT `用户ID`,COUNT(1) AS num FROM 套餐费用
GROUP BY `用户ID`)t
where t.num>1
#方法二
SELECT `用户ID` FROM 套餐费用
GROUP BY `用户ID`
HAVING count(1)>1
Python求解
1.各地市用户数、总费用(ARPU之和)是多少?
df=pd.read_csv('C:/Users/andyf/Desktop/ARPU.csv')
df_count=df.groupby(['城市','用户ID']).count().reset_index().groupby('城市')['用户ID'].count().reset_index()
df_ARPU=df.groupby('城市')['ARPU'].sum().reset_index()
print(df_count.merge(df_ARPU,on='城市'))
2.表一中各地市ARPU(0,30),[30,50),[50-80),[80以上)用户数分别是多少?
df=pd.read_csv('C:/Users/andyf/Desktop/ARPU.csv')
df['label']=pd.cut(df['ARPU'],bins=[0,30,50,80,1000],right=False)
print(df.pivot_table(index='城市',columns='label',values='用户ID',aggfunc='count').fillna(0))
3.表二中用户有重复的记录,找出重复的用户
df=pd.read_csv('C:/Users/andyf/Desktop/套餐费用.csv')
print(df[df['用户ID'].duplicated()]['用户ID'])
print(df[df['用户ID'].duplicated(keep='last')]['用户ID'])