用腾讯云做淘宝客购物网站视频软件界面设计app

news/2025/9/22 18:58:21/文章来源:
用腾讯云做淘宝客购物网站视频,软件界面设计app,aso应用优化,内蒙古乌海建设局网站4.6高级处理-缺失值处理 点击标题即可获取文章源代码和笔记 数据集#xff1a;https://download.csdn.net/download/weixin_44827418/12548095 Pandas高级处理缺失值处理数据离散化合并交叉表与透视表分组与聚合综合案例4.6 高级处理-缺失值处理1#xff09;如何进行缺失值处…4.6高级处理-缺失值处理 点击标题即可获取文章源代码和笔记 数据集https://download.csdn.net/download/weixin_44827418/12548095 Pandas高级处理缺失值处理数据离散化合并交叉表与透视表分组与聚合综合案例4.6 高级处理-缺失值处理1如何进行缺失值处理两种思路1删除含有缺失值的样本2替换/插补4.6.1 如何处理nan1判断数据中是否存在NaNpd.isnull(df)pd.notnull(df)2删除含有缺失值的样本df.dropna(inplaceFalse)替换/插补df.fillna(value, inplaceFalse)4.6.2 不是缺失值nan有默认标记的1替换 - np.nandf.replace(to_replace?, valuenp.nan)2处理np.nan缺失值的步骤2缺失值处理实例 4.7 高级处理-数据离散化性别 年龄 A 1 23 B 2 30 C 1 18物种 毛发 A 1 B 2 C 3男 女 年龄 A 1 0 23 B 0 1 30 C 1 0 18狗 猪 老鼠 毛发 A 1 0 0 2 B 0 1 0 1 C 0 0 1 1 one-hot编码哑变量 4.7.1 什么是数据的离散化原始的身高数据165174160180159163192184 4.7.2 为什么要离散化 4.7.3 如何实现数据的离散化1分组自动分组srpd.qcut(data, bins)自定义分组srpd.cut(data, [])2将分组好的结果转换成one-hot编码pd.get_dummies(sr, prefix) 4.8 高级处理-合并numpynp.concatnate((a, b), axis)水平拼接np.hstack()竖直拼接np.vstack()1按方向拼接pd.concat([data1, data2], axis1)2按索引拼接pd.merge实现合并pd.merge(left, right, howinner, on[索引]) 4.9 高级处理-交叉表与透视表找到、探索两个变量之间的关系4.9.1 交叉表与透视表什么作用4.9.2 使用crosstab(交叉表)实现pd.crosstab(value1, value2)4.9.3 pivot_table 4.10 高级处理-分组与聚合4.10.1 什么是分组与聚合4.10.2 分组与聚合APIdataframesr 4.6.1如何处理nan import pandas as pd movie pd.read_csv(./datas/IMDB-Movie-Data.csv) movieRankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0.......................................995996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.227585NaN45.0996997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.57315217.5446.0997998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.27069958.0150.0998999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.64881NaN22.09991000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.31243519.6411.0 1000 rows × 12 columns # 1. 判断是否存在NaN类型的缺失值,为True的就是缺失值 movie.isnull()RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore0FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse1FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse2FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse3FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse4FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse.......................................995FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse996FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse997FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse998FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseTrueFalse999FalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalseFalse 1000 rows × 12 columns import numpy as np# any() 只要有一个True就会返回True # 返回结果为True说明数据中存在缺失值 np.any(movie.isnull())True# 为False的就是缺失值 pd.notnull(movie)RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore0TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue1TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue2TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue3TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue4TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue.......................................995TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueFalseTrue996TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue997TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue998TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueFalseTrue999TrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrueTrue 1000 rows × 12 columns # all()只要有一个False就返回False # 返回结果为False说明数据中存在缺失值 np.all(pd.notnull(movie))Falsepd.isnull(movie).any()Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) True Metascore True dtype: boolpd.notnull(movie).all()Rank True Title True Genre True Description True Director True Actors True Year True Runtime (Minutes) True Rating True Votes True Revenue (Millions) False Metascore False dtype: bool# 缺失值处理 # 方法1 删除含有缺失值的样本 movie_full movie.dropna()movie_full.isnull().any()Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) False Metascore False dtype: bool# 方法2 替换 movie.head()RankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0 movie[Revenue (Millions)].mean()82.95637614678897# 含有缺失值的字段 # Revenue (Millions) False # Metascore False movie[Revenue (Millions)].fillna(movie[Revenue (Millions)].mean(),inplaceTrue)movie[Revenue (Millions)].isnull().any()False# inplaceTrue ,直接在原数据上进行填充 movie[Metascore].fillna(movie[Metascore].mean(),inplaceTrue)movie[Metascore].isnull().any()Falsemovie.isnull().any() # 缺失值已经处理完毕Rank False Title False Genre False Description False Director False Actors False Year False Runtime (Minutes) False Rating False Votes False Revenue (Millions) False Metascore False dtype: bool不是缺失值nan有默认标记的处理方法 data pd.read_csv(./datas/GBvideos.csv,encodingGBK)datavideo_idtitlechannel_titlecategory_idtagsviewslikesdislikescomment_totalthumbnail_linkdate0jt2OHQh0HoQLive Apple Event - Apple September Event 2017 ...Apple Event28apple events|apple event|iphone 8|iphone x|iph...74263937824013548705https://i.ytimg.com/vi/jt2OHQh0HoQ/default_liv...13.091AqokkXoa7uEHolly and Phillip Meet Samantha the Sex Robot ...This Morning24this morning|interview|holly willoughby|philli...494203265113090https://i.ytimg.com/vi/AqokkXoa7uE/default.jpg13.092YPVcg45W0z4My DNA Test Results? Im WHAT??emmablackery24emmablackery|emma blackery|emma|blackery|briti...142819131191511141https://i.ytimg.com/vi/YPVcg45W0z4/default.jpg13.093T_PuZBdT2iMgetting into a conversation in a language you ...ProZD1skit|korean|language|conversation|esl|japanese...15800286572915293598https://i.ytimg.com/vi/T_PuZBdT2iM/default.jpg13.094NsjsmgmbCfcBaby Name Challenge?Sprinkleofglitter26sprinkleofglitter|sprinkle of glitter|baby gli...40592501957490https://i.ytimg.com/vi/NsjsmgmbCfc/default.jpg13.09....................................1595w8fAellnPnsJuicy Chicken Breast - You Suck at Cooking (ep...You Suck At Cooking26how to|cooking|recipe|kitchen|chicken|chicken ...788466319459452274https://i.ytimg.com/vi/w8fAellnPns/default.jpg20.091596RsG37JcEQNwWeezer - Beach Boysweezer10weezer|pacific daydream|pacificdaydream|beach ...1079272435412641https://i.ytimg.com/vi/RsG37JcEQNw/default.jpg20.091597htSiIA2g7G8Berry Frozen Yogurt Bark RecipeSORTEDfood26frozen yogurt bark|frozen yoghurt bark|frozen ...109222484035212https://i.ytimg.com/vi/htSiIA2g7G8/default.jpg20.091598ZQK1F0wz6z4What Do You Want to Eat??Wong Fu Productions24panda|what should we eat|buzzfeed|comedy|boyfr...626223229625321559https://i.ytimg.com/vi/ZQK1F0wz6z4/default.jpg20.091599DuPXdnSWoLkThe Child in Time: Trailer - BBC OneBBC24BBC|iPlayer|bbc one|bbc 1|bbc1|trailer|the chi...992281699?135https://i.ytimg.com/vi/DuPXdnSWoLk/default.jpg20.09 1600 rows × 11 columns # 1. 将 替换为np.nan new_data data.replace(to_replace?,valuenp.nan)new_datavideo_idtitlechannel_titlecategory_idtagsviewslikesdislikescomment_totalthumbnail_linkdate0jt2OHQh0HoQLive Apple Event - Apple September Event 2017 ...Apple Event28apple events|apple event|iphone 8|iphone x|iph...74263937824013548705https://i.ytimg.com/vi/jt2OHQh0HoQ/default_liv...13.091AqokkXoa7uEHolly and Phillip Meet Samantha the Sex Robot ...This Morning24this morning|interview|holly willoughby|philli...494203265113090https://i.ytimg.com/vi/AqokkXoa7uE/default.jpg13.092YPVcg45W0z4My DNA Test Results? Im WHAT??emmablackery24emmablackery|emma blackery|emma|blackery|briti...142819131191511141https://i.ytimg.com/vi/YPVcg45W0z4/default.jpg13.093T_PuZBdT2iMgetting into a conversation in a language you ...ProZD1skit|korean|language|conversation|esl|japanese...15800286572915293598https://i.ytimg.com/vi/T_PuZBdT2iM/default.jpg13.094NsjsmgmbCfcBaby Name Challenge?Sprinkleofglitter26sprinkleofglitter|sprinkle of glitter|baby gli...40592501957490https://i.ytimg.com/vi/NsjsmgmbCfc/default.jpg13.09....................................1595w8fAellnPnsJuicy Chicken Breast - You Suck at Cooking (ep...You Suck At Cooking26how to|cooking|recipe|kitchen|chicken|chicken ...788466319459452274https://i.ytimg.com/vi/w8fAellnPns/default.jpg20.091596RsG37JcEQNwWeezer - Beach Boysweezer10weezer|pacific daydream|pacificdaydream|beach ...1079272435412641https://i.ytimg.com/vi/RsG37JcEQNw/default.jpg20.091597htSiIA2g7G8Berry Frozen Yogurt Bark RecipeSORTEDfood26frozen yogurt bark|frozen yoghurt bark|frozen ...109222484035212https://i.ytimg.com/vi/htSiIA2g7G8/default.jpg20.091598ZQK1F0wz6z4What Do You Want to Eat??Wong Fu Productions24panda|what should we eat|buzzfeed|comedy|boyfr...626223229625321559https://i.ytimg.com/vi/ZQK1F0wz6z4/default.jpg20.091599DuPXdnSWoLkThe Child in Time: Trailer - BBC OneBBC24BBC|iPlayer|bbc one|bbc 1|bbc1|trailer|the chi...992281699NaN135https://i.ytimg.com/vi/DuPXdnSWoLk/default.jpg20.09 1600 rows × 11 columns new_data.isnull().any() # 说明dislikes列中的已经替换成了NaNvideo_id False title False channel_title False category_id False tags False views False likes False dislikes True comment_total False thumbnail_link False date False dtype: boolnew_data.dropna(inplaceTrue)new_data.isnull().any()video_id False title False channel_title False category_id False tags False views False likes False dislikes False comment_total False thumbnail_link False date False dtype: bool4.7 高级处理-数据离散化 import pandas as pd # 准备数据 data pd.Series([165,174,160,180,159,163,192,184],index[No1:165,No2:174,No3:160,No4:180,No5:159,No6:163,No7:192,No8:184]) dataNo1:165 165 No2:174 174 No3:160 160 No4:180 180 No5:159 159 No6:163 163 No7:192 192 No8:184 184 dtype: int64自动分组 # 1. 分组# 自动分组 #qcut(data,组数) sr pd.qcut(data,3) srNo1:165 (163.667, 178.0] No2:174 (163.667, 178.0] No3:160 (158.999, 163.667] No4:180 (178.0, 192.0] No5:159 (158.999, 163.667] No6:163 (158.999, 163.667] No7:192 (178.0, 192.0] No8:184 (178.0, 192.0] dtype: category Categories (3, interval[float64]): [(158.999, 163.667] (163.667, 178.0] (178.0, 192.0]]# 查看分组情况 sr.value_counts()(178.0, 192.0] 3 (158.999, 163.667] 3 (163.667, 178.0] 2 dtype: int64type(sr)pandas.core.series.Series# 2. 将分组好的结果转换成独热编码 # prefix,设置列名的前缀 pd.get_dummies(sr,prefixheight)height_(158.999, 163.667]height_(163.667, 178.0]height_(178.0, 192.0]No1:165010No2:174010No3:160100No4:180001No5:159100No6:163100No7:192001No8:184001 自定义分组 # 自定义分组 # pd.cut(data,包含全部分界值的列表) sr pd.cut(data,[150,165,180,195]) srNo1:165 (150, 165] No2:174 (165, 180] No3:160 (150, 165] No4:180 (165, 180] No5:159 (150, 165] No6:163 (150, 165] No7:192 (180, 195] No8:184 (180, 195] dtype: category Categories (3, interval[int64]): [(150, 165] (165, 180] (180, 195]]sr.value_counts()(150, 165] 4 (180, 195] 2 (165, 180] 2 dtype: int64pd.get_dummies(sr,prefix身高)身高_(150, 165]身高_(165, 180]身高_(180, 195]No1:165100No2:174010No3:160100No4:180010No5:159100No6:163100No7:192001No8:184001 4.8 高级处理-合并 4.8.1 pd.concat实现合并按方向拼接 data1 np.arange(0,20,1).reshape(4,5) data1 pd.DataFrame(data1) data1012340012341567892101112131431516171819 data2 np.arange(100,120,1).reshape(4,5) data2 pd.DataFrame(data2) data2012340100101102103104110510610710810921101111121131143115116117118119 # 将data1 和 data2 进行水平拼接 data_concat pd.concat([data1,data2],axis1)data_concat01234012340012341001011021031041567891051061071081092101112131411011111211311431516171819115116117118119 data2.T012301001051101151101106111116210210711211731031081131184104109114119 # 将data1 和 data2 进行竖直拼接 data_concat1 pd.concat([data1,data2.T],axis0)data_concat101234001234.0156789.021011121314.031516171819.00100105110115NaN1101106111116NaN2102107112117NaN3103108113118NaN4104109114119NaN 4.8.2 pd.merge实现合并按索引拼接 leftpd.DataFrame({key1:[K0,K0,K1,K2], key2:[K0,K1,K0,K1], A:[A0,A1,A2,A3], B:[B0,B1,B2,B3]}) leftkey1key2AB0K0K0A0B01K0K1A1B12K1K0A2B23K2K1A3B3 rightpd.DataFrame({key1:[K0,K1,K1,K2], key2:[K0,K0,K0,K0], C:[Co,C1,C2,C3],D:[DO,D1,D2,D3]}) rightkey1key2CD0K0K0CoDO1K1K0C1D12K1K0C2D23K2K0C3D3 # 默认内连接inner # inner 保留共有的key result pd.merge(left,right,on[key1,key2],howinner) resultkey1key2ABCD0K0K0A0B0CoDO1K1K0A2B2C1D12K1K0A2B2C2D2 # left ,左连接 # 左表中所有的key都保留以左表为主进行合并 result_left pd.merge(left,right,on[key1,key2],howleft) result_leftkey1key2ABCD0K0K0A0B0CoDO1K0K1A1B1NaNNaN2K1K0A2B2C1D13K1K0A2B2C2D24K2K1A3B3NaNNaN # right ,右连接 # 右表中所有的key都保留以右表为主进行合并 result_right pd.merge(left,right,on[key1,key2],howright) result_rightkey1key2ABCD0K0K0A0B0CoDO1K1K0A2B2C1D12K1K0A2B2C2D23K2K0NaNNaNC3D3 # outer ,外连接 # 左右两表中所有的key都保留进行合并 result_outer pd.merge(left,right,on[key1,key2],howouter) result_outerkey1key2ABCD0K0K0A0B0CoDO1K0K1A1B1NaNNaN2K1K0A2B2C1D13K1K0A2B2C2D24K2K1A3B3NaNNaN5K2K0NaNNaNC3D3 4.9 高级处理-交叉表与透视表 用来探索两个变量之间的关系 4.9.2 使用crosstab交叉表实现 data pd.read_excel(./datas/szfj_baoan.xls) datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_price0baoan3289.3middle31007.07731baoan42127.0high31006.92912baoan1128.0low39003.92863baoan1128.0middle30003.35684baoan2278.0middle8115.0769..............................1246baoan4289.3low8004.25531247baoan2167.0middle30003.80601248baoan2267.4middle29105.34121249baoan2273.1low15105.95081250baoan3286.2middle32014.5244 1251 rows × 9 columns time 2020-06-23 # pandas日期类型 date pd.to_datetime(time) dateTimestamp(2020-06-23 00:00:00)type(date)pandas._libs.tslibs.timestamps.Timestampdate.year2020date.month6data[week] date.weekdaydata.drop(week,axis1,inplaceTrue)datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_price0baoan3289.3middle31007.07731baoan42127.0high31006.92912baoan1128.0low39003.92863baoan1128.0middle30003.35684baoan2278.0middle8115.0769..............................1246baoan4289.3low8004.25531247baoan2167.0middle30003.80601248baoan2267.4middle29105.34121249baoan2273.1low15105.95081250baoan3286.2middle32014.5244 1251 rows × 9 columns data[feature] np.where(data[per_price] 5.0000,1,0)datadistrictroomnumhallAREAC_floorfloor_numschoolsubwayper_pricefeature0baoan3289.3middle31007.077311baoan42127.0high31006.929112baoan1128.0low39003.928603baoan1128.0middle30003.356804baoan2278.0middle8115.07691.................................1246baoan4289.3low8004.255301247baoan2167.0middle30003.806001248baoan2267.4middle29105.341211249baoan2273.1low15105.950811250baoan3286.2middle32014.52440 1251 rows × 10 columns # 交叉表# 查看楼层 和 每平方米单价是否50000的关系 # 返回值为每个楼层中为0的个数和为1的个数 data0 pd.crosstab(data[floor_num],data[feature]) data0feature01floor_num168301401063771625819329211104911811121313420140515833169191720211817351911520242116220123482410262543726957275382863529266830307831415132211263334203415351236043711380139510401343014406450747015001510352025301 data0.sum(axis1) # 按行求和floor_num 1 14 3 1 4 10 6 10 7 41 8 51 9 13 10 13 11 19 12 4 13 24 14 5 15 41 16 28 17 41 18 52 19 16 20 6 21 7 22 1 23 12 24 36 25 41 26 66 27 43 28 41 29 94 30 108 31 155 32 147 33 54 34 6 35 3 36 4 37 2 38 1 39 15 40 4 43 1 44 6 45 7 47 1 50 1 51 3 52 2 53 1 dtype: int64data0.div(data0.sum(axis1),axis0) # 按行做除法feature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222310.0258060.974194320.1428570.857143330.6296300.370370340.1666670.833333350.3333330.666667360.0000001.000000370.5000000.500000380.0000001.000000390.3333330.666667400.2500000.750000430.0000001.000000440.0000001.000000450.0000001.000000470.0000001.000000500.0000001.000000510.0000001.000000520.0000001.000000530.0000001.000000 data_percent data0.div(data0.sum(axis1),axis0) data_percentfeature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222310.0258060.974194320.1428570.857143330.6296300.370370340.1666670.833333350.3333330.666667360.0000001.000000370.5000000.500000380.0000001.000000390.3333330.666667400.2500000.750000430.0000001.000000440.0000001.000000450.0000001.000000470.0000001.000000500.0000001.000000510.0000001.000000520.0000001.000000530.0000001.000000 # stackedTrue 是否重叠显示 data_percent.plot(kindbar,stackedTrue)matplotlib.axes._subplots.AxesSubplot at 0x24719dd7488data_percent data0.div(data0.sum(axis1),axis0) data_percenttrth50/thtd0.000000/tdtd1.000000/td /tr trth51/thtd0.000000/tdtd1.000000/td /tr trth52/thtd0.000000/tdtd1.000000/td /tr trth53/thtd0.000000/tdtd1.000000/td /trfeature01floor_num10.4285710.57142930.0000001.00000040.0000001.00000060.3000000.70000070.3902440.60975680.3725490.62745190.1538460.846154100.3076920.692308110.4210530.578947120.2500000.750000130.1666670.833333140.0000001.000000150.1951220.804878160.3214290.678571170.4878050.512195180.3269230.673077190.6875000.312500200.3333330.666667210.1428570.857143220.0000001.000000230.3333330.666667240.2777780.722222250.0975610.902439260.1363640.863636270.1162790.883721280.1463410.853659290.2765960.723404300.2777780.722222 4.9.3使用pivot_table透视表实现 # 通过透视表整个过程会变得更加简单些 # 结果直接就是值为1的百分比 data.pivot_table([feature],index[floor_num])... featurefloor_num10.57142931.00000041.00000060.700000501.000000511.000000521.000000531.000000 4.10 高级处理-分组与聚合 4.10.2 分组与聚合API col pd.DataFrame({color:[white,red,green,red,green],object:[pen,pencil,pencil,ashtray,pen],price1:[4.56,4.20,1.30,0.56,2.75],price2:[4.75,4.12,1.68,0.75,3.15]}) colcolorobjectprice1price20whitepen4.564.751redpencil4.204.122greenpencil1.301.683redashtray0.560.754greenpen2.753.15 # 进行分组对颜色进行分组对价格price1进行聚合 # 用DataFrame的方法进行分组 col.groupby(bycolor)[price1].max()color green 2.75 red 4.20 white 4.56 Name: price1, dtype: float64# 用Series的方法进行分组 col[price1].groupby(col[color])pandas.core.groupby.generic.SeriesGroupBy object at 0x000002471D178D08col[price1].groupby(col[color]).max()color green 2.75 red 4.20 white 4.56 Name: price1, dtype: float644.11 综合案例 # 1. 准备数据 movie pd.read_csv(./datas/IMDB-Movie-Data.csv) movieRankTitleGenreDescriptionDirectorActorsYearRuntime (Minutes)RatingVotesRevenue (Millions)Metascore01Guardians of the GalaxyAction,Adventure,Sci-FiA group of intergalactic criminals are forced ...James GunnChris Pratt, Vin Diesel, Bradley Cooper, Zoe S...20141218.1757074333.1376.012PrometheusAdventure,Mystery,Sci-FiFollowing clues to the origin of mankind, a te...Ridley ScottNoomi Rapace, Logan Marshall-Green, Michael Fa...20121247.0485820126.4665.023SplitHorror,ThrillerThree girls are kidnapped by a man with a diag...M. Night ShyamalanJames McAvoy, Anya Taylor-Joy, Haley Lu Richar...20161177.3157606138.1262.034SingAnimation,Comedy,FamilyIn a city of humanoid animals, a hustling thea...Christophe LourdeletMatthew McConaughey,Reese Witherspoon, Seth Ma...20161087.260545270.3259.045Suicide SquadAction,Adventure,FantasyA secret government agency recruits some of th...David AyerWill Smith, Jared Leto, Margot Robbie, Viola D...20161236.2393727325.0240.0.......................................995996Secret in Their EyesCrime,Drama,MysteryA tight-knit team of rising investigators, alo...Billy RayChiwetel Ejiofor, Nicole Kidman, Julia Roberts...20151116.227585NaN45.0996997Hostel: Part IIHorrorThree American college students studying abroa...Eli RothLauren German, Heather Matarazzo, Bijou Philli...2007945.57315217.5446.0997998Step Up 2: The StreetsDrama,Music,RomanceRomantic sparks occur between two dance studen...Jon M. ChuRobert Hoffman, Briana Evigan, Cassie Ventura,...2008986.27069958.0150.0998999Search PartyAdventure,ComedyA pair of friends embark on a mission to reuni...Scot ArmstrongAdam Pally, T.J. Miller, Thomas Middleditch,Sh...2014935.64881NaN22.09991000Nine LivesComedy,Family,FantasyA stuffy businessman finds himself trapped ins...Barry SonnenfeldKevin Spacey, Jennifer Garner, Robbie Amell,Ch...2016875.31243519.6411.0 1000 rows × 12 columns #问题1我们想知道这些电影数据中评分的平均分导演的人数等信息 # 我们应该怎么获取 movie[Rating].mean()6.723200000000003movie[Director]0 James Gunn 1 Ridley Scott 2 M. Night Shyamalan 3 Christophe Lourdelet 4 David Ayer... 995 Billy Ray 996 Eli Roth 997 Jon M. Chu 998 Scot Armstrong 999 Barry Sonnenfeld Name: Director, Length: 1000, dtype: object# np.unique()去重因为导演可能是多个电影的导演 np.unique(movie[Director])array([Aamir Khan, Abdellatif Kechiche, Adam Leon, Adam McKay,Adam Shankman, Adam Wingard, Afonso Poyart, Aisling Walsh,Akan Satayev, Akiva Schaffer, Alan Taylor, Albert Hughes,Alejandro Amenábar, Alejandro González Iñárritu,...Tomas Alfredson, Tony Gilroy, Tony Scott, Travis Knight,Tyler Shields, Wally Pfister, Walt Dohrn, Walter Hill,Warren Beatty, Werner Herzog, Wes Anderson, Wes Ball,Wes Craven, Whit Stillman, Will Gluck, Will Slocombe,William Brent Bell, William Oldroyd, Woody Allen,Xavier Dolan, Yimou Zhang, Yorgos Lanthimos, Zack Snyder,Zackary Adler], dtypeobject)# 导演的人数 np.unique(movie[Director]).size644# 问题2 对于这一组电影数据如果我们先rating,runtime的分布情况应该如何呈现数据 movie[Rating].plot(kindhist,figsize(20,8),fontsize40)matplotlib.axes._subplots.AxesSubplot at 0x2471ce18708import matplotlib.pyplot as plt# 1. 创建画布 plt.figure(figsize(20,8),dpi100)# 2. 绘制直方图 plt.hist(movie[Rating],20)# 修改刻度 plt.xticks(np.linspace(movie[Rating].min(),movie[Rating].max(),21))# 添加网格 plt.grid(linestyle--,alpha0.5)# 3. 显示图像 plt.show()movie[Rating]0 8.1 1 7.0 2 7.3 3 7.2 4 6.2... 995 6.2 996 5.5 997 6.2 998 5.6 999 5.3 Name: Rating, Length: 1000, dtype: float64# 问题3对于这一组电影数据如果我们希望统计电影分类genre的情况应该如何处理数据# 先统计电影类别有哪些 movie_genre [i.split(,) for i in movie[Genre]] movie_genre[[Action, Adventure, Sci-Fi],[Adventure, Mystery, Sci-Fi],[Horror, Thriller],[Animation, Comedy, Family],[Action, Adventure, Fantasy],...[Horror],[Drama, Music, Romance],[Adventure, Comedy],[Comedy, Family, Fantasy]][j for i in movie_genre for j in i][Action,Adventure,Sci-Fi,Adventure,Mystery,Sci-Fi, ...Animation,Action,Adventure,Action,Adventure,Drama,...]movie_class np.unique([j for i in movie_genre for j in i])movie_classarray([Action, Adventure, Animation, Biography, Comedy, Crime,Drama, Family, Fantasy, History, Horror, Music,Musical, Mystery, Romance, Sci-Fi, Sport, Thriller,War, Western], dtypeU9)len(movie_class) # 20 个电影类别20# 统计每个类别有几个电影# 先创建一个空的DataFrame表 count pd.DataFrame(np.zeros(shape[1000,20],dtypeint32),columnsmovie_class)count.head()ActionAdventureAnimationBiographyComedyCrimeDramaFamilyFantasyHistoryHorrorMusicMusicalMysteryRomanceSci-FiSportThrillerWarWestern000000000000000000000100000000000000000000200000000000000000000300000000000000000000400000000000000000000 count.loc[0,movie_genre[0]]Action 0 Adventure 0 Sci-Fi 0 Name: 0, dtype: int32movie_genre[0][Action, Adventure, Sci-Fi]# 计数填表 for i in range(1000):count.loc[i,movie_genre[i]] 1countActionAdventureAnimationBiographyComedyCrimeDramaFamilyFantasyHistoryHorrorMusicMusicalMysteryRomanceSci-FiSportThrillerWarWestern011000000000000010000101000000000001010000200000000001000000100300101001000000000000411000000100000000000...............................................................9950000011000000100000099600000000001000000000997000000100001001000009980100100000000000000099900001001100000000000 1000 rows × 20 columns # 按列求和 count.sum(axis0)Action 303 Adventure 259 Animation 49 Biography 81 Comedy 279 Crime 150 Drama 513 Family 51 Fantasy 101 History 29 Horror 119 Music 16 Musical 5 Mystery 106 Romance 141 Sci-Fi 120 Sport 18 Thriller 195 War 13 Western 7 dtype: int64count.sum(axis0).sort_values(ascendingFalse)Drama 513 Action 303 Comedy 279 Adventure 259 Thriller 195 Crime 150 Romance 141 Sci-Fi 120 Horror 119 Mystery 106 Fantasy 101 Biography 81 Family 51 Animation 49 History 29 Sport 18 Music 16 War 13 Western 7 Musical 5 dtype: int64count.sum(axis0).sort_values(ascendingFalse).plot(kindbar,fontsize20,figsize(20,9),colormapcool)matplotlib.axes._subplots.AxesSubplot at 0x2472450c1c8

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/910050.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

百度营销-网站分析培训建筑设计网站

教程介绍 学习信息收集,针对域名信息,解析信息,网站信息,服务器信息等;学习端口扫描,针对端口进行服务探针,理解服务及端口对应关系;学习WEB扫描,主要针对敏感文件,安全漏洞,子域名信息等;学习信息收集方法…

Codeforces 2127 D(图论,组合数学,DFS,分类讨论)

Codeforces 2127 D(图论,组合数学,DFS,分类讨论)Codeforces 2127 D D. Root was Built by Love, Broken by Destiny 题意: n栋房子,其中有m做桥分别连接两栋房子,然后把这些房子分别排列在南北两岸顺序不限,排列…

Java学习笔记:从三个实验看编程思维的锤炼

在石家庄铁道大学《Java语言程序设计》课程的学习过程中,我通过三个有趣的动手实验,不仅掌握了具体的语法知识,更深刻体会到编程思维的重要性。这些实验看似简单,却蕴含着程序设计中的深层原理。 枚举类型:从混沌…

完整教程:App 上架平台全解析,iOS 应用发布流程、苹果 App Store 审核步骤

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

视频网站用什么做的好处公司介绍网站模板

部署描述: 1.jenkins 通过maven编译成jar 项目包 2.shell 脚本从jenkins机器发布到:目标主机 注释:次脚本没有写jar包的备份,有时间加上 脚本内容: #!/bin/bash#线上服务器列表 HOST_LIST${:2}#项目名 REMOTE_PROJECT$…

h5创建网站网站推广专业术语

在现代化的电子产品中,音频功能的重要性日益凸显。无论是智能家居、玩具、医疗设备还是仪器仪表,富有吸引力的音效与语音提示都能显著提升用户体验。唯创知音WT2605C语音芯片MP3音频IC便是为了满足这一需求而诞生的,它具备指令随机播放、无缝…

移动端网站开发技术在北京注册公司要哪些条件

大多数WordPress站点都是个人博客网站,主要以文章性质的图文为主。不过部分站长想要用WordPress搭建一个产品展示站,应该怎么做呢? 其实,WordPress可以用来建立各种各样的博客网站,包括个人博客、企业网站、商城、影视…

旅游网站模块报价免费响应式企业网站源码

我是一名大专生,自19年通过校招进入湖南某软件公司以来,便扎根于功能测试岗位,一晃便是近四年的光阴。今年8月,我如梦初醒,意识到长时间待在舒适的环境中,已让我变得不思进取,技术停滞不前。更令…

石家庄网站网站建设wordpress文章编辑旧版

BOSHIDA DC电源模块检测故障步骤有哪些 DC电源模块检测故障步骤如下: 1. 检查输入电压:用万用表测量输入电压,确保其在规定范围内。 2. 检查输出电压:用万用表或示波器测量输出电压,确保其在规定范围内。 3. 检查输…

网站制作编辑软件ssp媒体服怎样做网站

linux中oops信息的调试及栈回溯【转】本文转载自:http://blog.csdn.net/kangear/article/details/8217329 ...linux 2.6 驱动笔记(一)本文作为linux 2.6 驱动笔记,记录环境搭建及linux基本内核模块编译加载. 环境搭建: 硬件:OK6410开发板 目标板操作系统:linux 2.6…

题解:AT_arc068_d [ARC068F] Solitaire

简单数数。 题意:很简单了,不再赘述。 做法: 首先我们考虑这个 deque 里面的数是什么样子的,发现一定是个谷型并且 \(1\) 是谷底,这个显然,那么就意味着,对于前 \(k-1\) 个位置是可以分成两个下降序列,然后 \(…

Codeforces Round 1051 (Div. 2) D1D2题解

D1. Inversion Graph Coloring (Easy Version) 题意: 给定一个序列 \(a_1, a_2, \ldots, a_n\),我们需要计算其“好”子序列的数量。一个子序列是“好”的,如果存在一种将它的索引染成红色或蓝色的方式,使得对于任…

网站备案名称中国开头选课网站开发

雷迪斯and the乡亲们 欢迎你们来到 奇幻的编程世界 17.wc命令 作用: 统计行数、单词数、字符分数 格式: wc 选项 文件 选项: -l: 统计行数 -w: 统计单词 -c :统计字符 例子: 162&…

每日报告-关于本学期的计划

每日报告-关于本学期的计划1.确定100人次的社会调研的主题 选题:你觉得市面上缺少哪种APP/你还需要什么APP

阿里云 ip 网站东莞网站seo优化托管

1、简单介绍 继前面发布的 GroundingDino 和 Open-GroundingDino的推理 和 Open-GroundingDino的训练实现,作为 GroundingDino延续性的文本检测网络 MM-Grounding-DINO 也发布了较详细的 训练和推理实现教程,而且操作性很强。作为学习内容,也…

青海建设厅报名网站基于html5的网站开发

Shell 教程 Shell 是一个用 C 语言编写的程序,它是用户使用 Linux 的桥梁。Shell 既是一种命令语言,又是一种程序设计语言。 Shell 是指一种应用程序,这个应用程序提供了一个界面,用户通过这个界面访问操作系统内核的服务。 Ke…

长春建站最新消息经典营销案例分析

-Xms256m -Xmx256m -XX:MaxPermSize64m 如果 jvm 启动失败, 说堆内存不够, 需要调小 初始堆和最大堆大小, 持久代大小; 第一行的参数是调节后的vm参数荔枝 ;

网站免费正能量小说家用电脑做网站后台

一、前言各位小伙伴们还有几天新的一年即将来临,这篇文章作为今年的结束吧。不知道大家对自己每一年的技术发展规划是什么,我在这里分享一下我2021年的新的规划,这里非常感谢各位小伙伴对我的关注。二、内容概要2021的布局客户端技术分享服务…

浙江建设厅网站安全员证书查询wordpress添加新建标签页

在最近结束的 VMware Explore 2023 拉斯维加斯大会上,VMware 推出了新的 Private AI 产品,以促进企业采用生成式人工智能并挖掘可信数据的价值。VMware 宣布了以下几点: 与 NVIDIA 合作推出 VMware Private AI Foundation,将两家…

凡科网站怎么做授权查询黑龙江建设网证书查询官网

鸿蒙(HarmonyOS)项目方舟框架(ArkUI)之QRCode组件 一、操作环境 操作系统: Windows 10 专业版、IDE:DevEco Studio 3.1、SDK:HarmonyOS 3.1 二、QRCode组件 用于显示单个二维码的组件。 子组件 无。 接口 QRCode(value: st…