某音数据分析

大家好,我是烤鸭:

    某音竟然有pc版了,不过搜索的数据有限,会限制条数,亲测只能搜索400条数据,简单分析下过程。

工具使用

java + chromedriver + fiddler

java + selenium 自动化网页,需要登录,可以登录一次共享cookie

@Test
public void testXyin() {String keyWord = "旅游";try {// 调用chrome driverSystem.setProperty("webdriver.chrome.driver", "D:\\dev\\env\\chromedriver\\chromedriver.exe");// 共享cookie// ChromeOptionsChromeOptions chromeOptions = new ChromeOptions();// 添加用户cookieschromeOptions.addArguments("--user-data-dir=C:\\Users\\user\\AppData\\Local\\Google\\Chrome\\User Data-Cookie");WebDriver driver = new ChromeDriver(chromeOptions);// 窗口最大化driver.manage().window().maximize();driver.get("https://www.douyin.com/search/"+ keyWord+ "?publish_time=0&sort_type=0&source=normal_search&type=general");// 调整高度((ChromeDriver) driver).executeScript("window.scrollTo(0, document.body.scrollHeight);");Thread.sleep(1000);// 构建driver对象driver.manage().timeouts().implicitlyWait(3, TimeUnit.SECONDS);WebElement webElement = driver.findElement(By.cssSelector("body"));webElement.click(); // 有的时候必须点击一下,下拉才能生效(有的网站是这样,原因未找到)} catch (Exception e) {e.printStackTrace();}
}

fiddler 脚本

修改 Fiddler,Rules—>Customize Rules, 改写 OnBeforeResponse 方法

	static function OnBeforeResponse(oSession: Session) {if (m_Hide304s && oSession.responseCode == 304) {oSession["ui-hide"] = "true";}//加在方法末尾if (oSession.HostnameIs("www.douyin.com") && oSession.uriContains("https://www.douyin.com/aweme/v1/web/general/search/single")){var filename = "D:\\data\\dy\\fiddler-token.log";var curDate = new Date();var logContent =  "[" + curDate.toLocaleString() + "] " + oSession.GetRequestBodyAsString() + "\r\n"+oSession.GetResponseBodyAsString()+"\r\n";var sw : System.IO.StreamWriter;if (System.IO.File.Exists(filename)){sw = System.IO.File.AppendText(filename);sw.Write(logContent);}else{sw = System.IO.File.CreateText(filename);sw.Write(logContent);}sw.Close();sw.Dispose();}}

解析数据

读取文件解析:

public void readText() {ReaderTxt rt = new ReaderTxt();ArrayList<String> list = rt.InitTxt();for (int i = 0; i < list.size(); i++) {String txt = list.get(i);if (!txt.startsWith("{")) {continue;}JSONObject jrs = JSONObject.parseObject(txt);JSONArray array = jrs.getJSONArray("data");for (Object obs : array) {DyScrapVideo scrapVideo = new DyScrapVideo();JSONObject json = (JSONObject) obs;// aweme_infoJSONObject awemeInfo = json.getJSONObject("aweme_info");if (!Optional.ofNullable(awemeInfo).isPresent()) {continue;}// https://www.douyin.com/video/ + aweme_id 详情页String aweme_id = awemeInfo.getString("aweme_id");String desc = awemeInfo.getString("desc");Long publishTime = awemeInfo.getLong("create_time");scrapVideo.setVideoDesc(desc);scrapVideo.setAwemeId(aweme_id);scrapVideo.setVideoPublishTime(UnixUtil.TimeStamp2Date(publishTime + ""));// authorJSONObject author = awemeInfo.getJSONObject("author");Long aLong = author.getLong("uid");String nickname = author.getString("nickname");String signature = author.getString("signature");scrapVideo.setAuthorUid(aLong + "");scrapVideo.setAuthorNickname(nickname);scrapVideo.setAuthorSignature(signature);JSONObject avatar_thumb = author.getJSONObject("avatar_thumb");JSONArray url_list = avatar_thumb.getJSONArray("url_list");if (Optional.ofNullable(url_list).isPresent()) {scrapVideo.setAuthorAvatarThumb(url_list.get(0).toString());}Long follower_count = author.getLong("follower_count");scrapVideo.setFollowerCount(follower_count != null ? follower_count.intValue() : 0);String custom_verify = author.getString("custom_verify");scrapVideo.setCustomVerify(custom_verify);// videoJSONObject video = awemeInfo.getJSONObject("video");if(video != null){JSONObject download_addr = video.getJSONObject("download_addr");if(download_addr != null){JSONArray down_url_list = download_addr.getJSONArray("url_list");if (Optional.ofNullable(down_url_list).isPresent()) {scrapVideo.setVideoDownloadAddr(UnicodeUtil.unicodeToCN(down_url_list.get(0).toString()));}}Integer duration = video.getInteger("duration");scrapVideo.setVideoDuration(duration);}// statisticsJSONObject statistics = awemeInfo.getJSONObject("statistics");if(statistics != null){Integer comment_count = statistics.getInteger("comment_count");Integer digg_count = statistics.getInteger("digg_count");Integer download_count = statistics.getInteger("download_count");Integer play_count = statistics.getInteger("play_count");Integer share_count = statistics.getInteger("share_count");Integer collect_count = statistics.getInteger("collect_count");scrapVideo.setCommentCount(comment_count);scrapVideo.setDiggCount(digg_count);scrapVideo.setDownloadCount(download_count);scrapVideo.setPlayCount(play_count);scrapVideo.setShareCount(share_count);scrapVideo.setCollectCount(collect_count);}scrapVideo.setCreateDate(new Date());scrapVideo.setSearchKeyword("北京旅游");}}
}public ArrayList<String> InitTxt() {ArrayList<String> list = new ArrayList<String>();try { // 防止文件建立或读取失败,用catch捕捉错误并打印,也可以throw/* 读入TXT文件 */String pathname ="D:\\data\\fiddler-token.log"; // 绝对路径或相对路径都可以,这里是绝对路径,写入文件时演示相对路径File filename = new File(pathname);InputStreamReader reader =new InputStreamReader(new FileInputStream(filename), "utf-8"); // 建立一个输入流对象readerBufferedReader br = new BufferedReader(reader); // 建立一个对象,它把文件内容转成计算机能读懂的语言String line = "";while (line != null) {line = br.readLine(); // 一次读入一行数据if (line == null) {break;}list.add(line);}} catch (Exception e) {e.printStackTrace();}return list;
}

实体对象:

package com.machu.picchu.crawler.dto;import java.util.Date;public class DyScrapVideo {private Integer id;private String awemeId;private String videoDesc;private Date videoPublishTime;private String videoDownloadAddr;private Integer videoDuration;private Integer commentCount;private Integer diggCount;private Integer playCount;private Integer downloadCount;private Integer shareCount;private Integer collectCount;private String authorUid;private String authorNickname;private String authorSignature;private String authorAvatarThumb;private Integer followerCount;private String customVerify;private Date createDate;private Date publishDate;private String searchKeyword;private String memo;private Integer status;public Integer getId() {return id;}public void setId(Integer id) {this.id = id;}public String getVideoDesc() {return videoDesc;}public void setVideoDesc(String videoDesc) {this.videoDesc = videoDesc;}public Date getVideoPublishTime() {return videoPublishTime;}public void setVideoPublishTime(Date videoPublishTime) {this.videoPublishTime = videoPublishTime;}public String getVideoDownloadAddr() {return videoDownloadAddr;}public void setVideoDownloadAddr(String videoDownloadAddr) {this.videoDownloadAddr = videoDownloadAddr;}public Integer getVideoDuration() {return videoDuration;}public void setVideoDuration(Integer videoDuration) {this.videoDuration = videoDuration;}public Integer getCommentCount() {return commentCount;}public void setCommentCount(Integer commentCount) {this.commentCount = commentCount;}public Integer getDiggCount() {return diggCount;}public void setDiggCount(Integer diggCount) {this.diggCount = diggCount;}public Integer getPlayCount() {return playCount;}public void setPlayCount(Integer playCount) {this.playCount = playCount;}public Integer getDownloadCount() {return downloadCount;}public void setDownloadCount(Integer downloadCount) {this.downloadCount = downloadCount;}public Integer getShareCount() {return shareCount;}public void setShareCount(Integer shareCount) {this.shareCount = shareCount;}public Integer getCollectCount() {return collectCount;}public void setCollectCount(Integer collectCount) {this.collectCount = collectCount;}public String getAuthorUid() {return authorUid;}public void setAuthorUid(String authorUid) {this.authorUid = authorUid;}public String getAuthorNickname() {return authorNickname;}public void setAuthorNickname(String authorNickname) {this.authorNickname = authorNickname;}public String getAuthorSignature() {return authorSignature;}public void setAuthorSignature(String authorSignature) {this.authorSignature = authorSignature;}public String getAuthorAvatarThumb() {return authorAvatarThumb;}public void setAuthorAvatarThumb(String authorAvatarThumb) {this.authorAvatarThumb = authorAvatarThumb;}public Integer getFollowerCount() {return followerCount;}public void setFollowerCount(Integer followerCount) {this.followerCount = followerCount;}public String getCustomVerify() {return customVerify;}public void setCustomVerify(String customVerify) {this.customVerify = customVerify;}public Date getCreateDate() {return createDate;}public void setCreateDate(Date createDate) {this.createDate = createDate;}public Date getPublishDate() {return publishDate;}public void setPublishDate(Date publishDate) {this.publishDate = publishDate;}public String getSearchKeyword() {return searchKeyword;}public void setSearchKeyword(String searchKeyword) {this.searchKeyword = searchKeyword;}public String getMemo() {return memo;}public void setMemo(String memo) {this.memo = memo;}public Integer getStatus() {return status;}public void setStatus(Integer status) {this.status = status;}public String getAwemeId() {return awemeId;}public void setAwemeId(String awemeId) {this.awemeId = awemeId;}
}

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/412469.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Codeforces Round #530 Div. 1 自闭记

A&#xff1a;显然应该让未确定的大小尽量大。不知道写了啥就wa了一发。 #include<iostream> #include<cstdio> #include<cmath> #include<cstdlib> #include<cstring> #include<algorithm> using namespace std; #define ll long long #…

自研redis sdk支持自动dns切换(附源码)

大家好&#xff0c;我是烤鸭&#xff1a; 标题起的有点大了&#xff0c;说是自研&#xff0c;其实就是个封装&#xff0c;不过倒是解决了dns切换的问题&#xff08;虽然不太优雅&#xff09;。 背景 之前做活动的时候&#xff0c;用域名链接的redis&#xff0c;当时做了主备集…

《黑客与画家》读书笔记

《黑客与画家》读书笔记 大家好&#xff0c;我是烤鸭&#xff1a; 《黑客与画家》&#xff0c;读书笔记。这次想修改以前那种章节式的笔记&#xff0c;一个是这本书是比较主观的&#xff0c;一个是想换个风格。 作者 保罗格雷厄姆&#xff08;Paul Graham&#xff09;&a…

使用Canal实现redis和mysql的同步

使用Canal实现redis和mysql的同步 ### canal 工作思路 Canal 会将自己伪装成 MySQL 从节点&#xff08;Slave&#xff09;&#xff0c;并从主节点&#xff08;Master&#xff09;获取 Binlog&#xff0c;解析和贮存后供下游消费端使用。Canal 包含两个组成部分&#xff1a;服务…

上线到凌晨4点半 pagehelper的bug?

大家好&#xff0c;我是烤鸭&#xff1a; 上上周末上线到凌晨4点半&#xff0c;哭了&#xff0c;没想到问题竟然如此简单。最近又懒惰了&#xff0c;写了开头就一直放着了&#xff0c;今天终于补上。 ​ 问题日志 Error querying database. Cause: com.github.pagehelper.P…

sql 查询结果自定义排序

sqlserver 使用case when then 语句来实现 select name from fruit order by case namewhen Strawberry then 1when Banana then 2when Apple then 3else 4 end oracle 使用decode实现 select * from table_example order by decode(class,C,1,A,2,D,3,B,4) 转载于:https://www…

skywalking 引起 spring-cloud-gateway 的内存溢出 skywalking的bug

大家好&#xff0c;我是烤鸭&#xff1a; 又是个线上问题记录&#xff0c;这次坑惨了&#xff0c;开源软件也不是万能的&#xff0c;还是要做好压测和灰度。 问题 上游反馈大量超时&#xff0c;不止某一个服务&#xff0c;查看服务没有问题&#xff0c;猜测是网络或者环境问题…

长连接检测 监控的一点思考 java实现

大家好&#xff0c;我是烤鸭&#xff1a; 怎么监控长链接服务器的稳定&#xff0c;除了探活服务之外&#xff0c;怎么保证长链接的收发正常&#xff0c;这篇文章考虑下这个。 问题来源 运营反馈部分直播间无法收到弹幕、点赞消息&#xff0c;第一时间进行复现&#xff0c;发现…

rabbitmq 启动失败 dump日志分析

大家好&#xff0c;我是烤鸭&#xff1a; rabiitmq 突然宕机&#xff0c;并且无法启动。同事反馈测试环境 rabbitmq 有一个节点突然掉了&#xff0c;并且无法启动。 现象 集群有一个节点宕机。 去对应的机器上执行启动命令 ./rabbitmq-server -detached发现进程不在&#x…

项目实战-药品采购系统-day01

目标&#xff1a;项目背景&#xff0c;需求&#xff0c;环境的搭建 难点&#xff1a;环境的搭建 1.学习方法&#xff1a; 一个项目&#xff1a;架构师、高级程序员、一般程序员 难度很大但是普通的程序员做很简单 所以自己对自己的定位要准确&#xff08;一般程序员&#xff09…

《深入理解Java虚拟机》-读书笔记(第一、第二部分)

大家好&#xff0c;我是烤鸭&#xff1a; 《深入理解Java虚拟机》-读书笔记&#xff08;第一、第二部分&#xff09;。 第一部分&#xff1a;走进Java 第1章 走进Java 1.1 概述 摆脱了硬件平台的束缚&#xff0c;实现了“一次编写&#xff0c;到处运行”的理想&#xff1b;…

BZOJ1189: [HNOI2007]紧急疏散evacuate(二分答案,最大流)

Description 发生了火警&#xff0c;所有人员需要紧急疏散&#xff01;假设每个房间是一个N M的矩形区域。每个格子如果是.&#xff0c;那么表示这是一块空地&#xff1b;如果是X&#xff0c;那么表示这是一面墙&#xff0c;如果是D&#xff0c;那么表示这是一扇门&#xff0c;…

[vue]vue渲染模板时怎么保留模板中的HTML注释呢?

[vue]vue渲染模板时怎么保留模板中的HTML注释呢&#xff1f; <template comments>... </template>个人简介 我是歌谣&#xff0c;欢迎和大家一起交流前后端知识。放弃很容易&#xff0c; 但坚持一定很酷。欢迎大家一起讨论 主目录 与歌谣一起通关前端面试题

redis设置为null问题

查看源码后发现&#xff0c;redis没有删除方法&#xff0c;本想给他设置为null,但是redis报错&#xff0c;所有仔细想了一下&#xff0c;发现redis提供了一个时间限制方法&#xff0c;所有可以让redis的时间限制为1s&#xff0c;就想当于删除redis中的这个K。 转载于:https://w…