阿里试用排序
抱歉,之前莫名其妙把配置文件给 ignore 了,已经修复,抱歉
前景提要
说来简直丢尽了钢铁直男的脸,没错,昨晚我在愉快的做着外包的活(中国移动的小程序,自由职业,喂),11点多了,女友突然脑子一抽:“你能不能帮我把这个玩意排序一下给我用啊,我好薅点羊毛,技术能实现嘛?”
我比较无奈的看了看,阿里试用咩?什么鬼,哦哦哦,就这玩意啊,爬虫爬一下就是了。我是前端……
回道:“没问题啊,爬虫呗。”
她:“哇,多久能做出来啊?”
我:“我现在在忙诶,1-2小时吧。”
她:“行了,你别忙了,赶紧帮我弄一下出来!”
我看了看她的脸,羞耻的最小化《微信开发者工具》。。。
页面展示
你要是觉得这也是广告,那真是太抬举我了。
爬虫搞起来
NodeJS 爬虫,百度一下,到处都是现成的代码,我也就不一一分析了,拿出简书的一段代码,来自 埃米莉Emily:
const express = require('express');
// 调用 express 实例,它是一个函数,不带参数调用时,会返回一个 express 实例,将这个变量赋予 app 变量。
const superagent = require('superagent');
const cheerio = require('cheerio');
const app = express();app.get('/', (req, res, next) => {console.log(req)superagent.get('https://www.v2ex.com/').end((err, sres) => {// 常规的错误处理if (err) {return next(err);}// sres.text 里面存储着网页的 html 内容,将它传给 cheerio.load 之后// 就可以得到一个实现了 jquery 接口的变量,我们习惯性地将它命名为 `$`// 剩下就都是 jquery 的内容了let $ = cheerio.load(sres.text);let items = [];$('.item_title a').each((idx, element) => {let $element = $(element);items.push({title: $element.text(),href: $element.attr('href')});});res.send(items);});
});app.listen(3000, function () {console.log('app is listening at port 3000');
});
嘛,express 用 NodeJS 的不可能不知道,superagent 理解成可以在 Node 里面做对外请求即可,cheerio 嗯,Node 专用 JQ。
首爬
把上面的请求地址换成:https://try.taobao.com/
,查看页面标签结构,找到想要的选择器结构:
.tb-try-wd-item-info > .detail
,把这个替换上面选择器 .item_title a
,走起:
……我不想展示结果,因为只有六个,页面实际展示是 10 个,找了半天,发现两个问题:
如上,第一个是爬到的 6 个是推荐,喵的,不是下面列表;
第二个,下面列表是后面通过 POST 单独请求来的数据,怎么看都是某框架的 SSR 干的好事。
于是爬虫不成,得换战略。
模拟 POST
OK,既然是 POST,就好弄了,直接把连接跟参数刨出来,然后 superagent 模拟:
superagent.post(`https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch`).set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8').end((err, sres) => {// 常规的错误处理if (err) {return next(err)}const result = JSON.parse(sres.text).result // 返回结构树resolve(result)})
content-type 源自:
哼哼哼,你没猜错,失败了,如下:
想想是必然的,怎么可能给你随便请求呢,然后该怎么做?研究?nonono,老夫上来就是一梭子,不就是 Content-Type 么!
superagent.post(`https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch`).set('user-agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36').set('accept', 'pplication/json, text/javascript, */*; q=0.01').set('accept-encoding', 'gzip, deflate, br').set('accept-language','zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7,zh-TW;q=0.6,da;q=0.5')// .set('content-length', '8').set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8').set('cookie','your cookie').set('origin', 'https://try.taobao.com').set('referer', 'https://try.taobao.com').set('x-csrf-token', 'f0b8e7443eb7e').set('x-requested-with', 'XMLHttpRequest').end((err, sres) => {// 常规的错误处理if (err) {return next(err)}const result = JSON.parse(sres.text).resultresolve(result)})
依据就是下面这个:
不就是头么,不就是源么,不就是用户代理么,用个 HTTPS 还没有你办法了?
注意上面 .set('content-length', '8')
,不知道那边怎么玩,加上这个就超时……
于是,交代了吧:
{"pages": {"paging": {"n": 2182,"page": 1,"pages": 219},"items": [{"shopUserId": "2450112357","title": "凯度高端款嵌入式蒸烤箱","status": 1,"totalNum": 1,"requestNum": 15530,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "casdon凯度旗舰店","showId": "2561626","startTime": 1539619200000,"endTime": 1540220400000,"id": "34530215","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1ycS2eMDqK1RjSZSyXXaxEVXa.jpg","shopItemId": "559771706359","price": 13850},{"shopUserId": "3189770892","title": "皇家美素佳儿老包装2段400g","status": 1,"totalNum": 50,"requestNum": 2079,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "皇家美素佳儿旗舰店","showId": "2551240","startTime": 1539619200000,"endTime": 1540220400000,"id": "34396042","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1YrSZaVYqK1RjSZLeXXbXppXa.jpg","shopItemId": "547114874458","price": 189},{"shopUserId": "1077716829","title": "关注店铺优先审水密码幻彩隔离","status": 1,"totalNum": 10,"requestNum": 6907,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "水密码旗舰店","showId": "2568391","startTime": 1539619200000,"endTime": 1540220400000,"id": "34784086","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB16_4ChmzqK1RjSZPxXXc4tVXa.jpg","shopItemId": "559005882880","price": 599},{"shopUserId": "725786863","title": "精品皮草派克大衣","status": 1,"totalNum": 1,"requestNum": 11793,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "美瑞蓓特","showId": "2557886","startTime": 1539619200000,"endTime": 1540220400000,"id": "34574078","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1zVLMdCrqK1RjSZK9XXXyypXa.jpg","shopItemId": "577418950477","price": 5980},{"shopUserId": "3000840351","title": "保友智能新品Pofit电脑椅","status": 1,"totalNum": 1,"requestNum": 12895,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "保友办公家具旗舰店","showId": "2557100","startTime": 1539619200000,"endTime": 1540220400000,"id": "34528042","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1bYZEg6TpK1RjSZKPXXa3UpXa.png","shopItemId": "577598687971","price": 5408},{"shopUserId": "791732485","title": "TEK手持吸尘器A8","status": 1,"totalNum": 1,"requestNum": 17195,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "泰怡凯旗舰店","showId": "2552265","startTime": 1539619200000,"endTime": 1540220400000,"id": "34444014","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1D6bWbhTpK1RjSZFGXXcHqFXa.jpg","shopItemId": "547653053965","price": 5199},{"shopUserId": "3229583972","title": "椰富海南冷炸椰子油食用油1L","status": 1,"totalNum": 20,"requestNum": 4451,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "椰富食品专营店","showId": "2561698","startTime": 1539619200000,"endTime": 1540220400000,"id": "34532250","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1VjLSePDpK1RjSZFrXXa78VXa.jpg","shopItemId": "578653506446","price": 256},{"shopUserId": "855223948","title": "卡西欧立式家用电钢琴PX770","status": 1,"totalNum": 1,"requestNum": 16762,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "世纪音缘乐器专营店","showId": "2551326","startTime": 1539619200000,"endTime": 1540220400000,"id": "34420041","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1CC6aa9zqK1RjSZFpXXakSXXa.jpg","shopItemId": "562405126383","price": 4838},{"shopUserId": "4065939832","title": "关注宝贝送轻奢沙发床","status": 1,"totalNum": 1,"requestNum": 17436,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "贝兮旗舰店","showId": "2559904","startTime": 1539619200000,"endTime": 1540220400000,"id": "34532170","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1AzxYegHqK1RjSZFPXXcwapXa.jpg","shopItemId": "577798067313","price": 4399},{"shopUserId": "807974445","title": "森海塞尔CX6蓝牙耳机","status": 1,"totalNum": 4,"requestNum": 22557,"acceptNum": 0,"reportNum": 0,"isApplied": false,"shopName": "sennheiser旗舰店","showId": "2559701","startTime": 1539619200000,"endTime": 1540220400000,"id": "34532161","type": 1,"pic": "//img.alicdn.com/bao/uploaded/TB1HET6d7voK1RjSZFwXXciCFXa.jpg","shopItemId": "564408956766","price": 999}]}
}
细心的小伙伴应该看到,我没有发送 form 给他,一样可以请求到需要的数据,page 挂在了 query 上……
展示部分
数据拿到,就简单了,其实就是这一个接口实现剩下的功能了,没错,记住我是前端。
<!DOCTYPE html>
<html lang="en"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="ie=edge"><title>tb try</title><style>.warning {color: red;}button {width: 100px;height: 44px;margin-right: 44px;}table {border: 1px solid #d8d8d8;border-collapse: collapse;}tr {border-bottom: 1px solid #d8d8d8;cursor: pointer;}tr:last-child {border: 0;}</style>
</head><body><button onclick="postPage()">下一页</button><span id="currentPage"></span><table><tbody><tr><th>序号(倒序)</th><th>概率</th><th>名字</th></tr></tbody><tbody id="results"></tbody></table><script>let currentPage = 0 // 当前页面let allItems = [] // 全部数据let currentTime = 0 // 锁频率使用,标记上次时间const xhr = new XMLHttpRequest()const loopInterval = 2 // 锁频率步长,单位秒const results = document.querySelector('#results')const currentPageText = document.querySelector('#currentPage')const reFullTBody = arr => {let innerHtml = ''arr.forEach((item, i) => {item.rate = item.totalNum / item.requestNum * 100let tr = `<tr onclick="window.open('https://try.taobao.com/item.htm?id=${item.id}')"><td>${i + 1}</td><td>${item.rate.toFixed(3) + '%'}</td><td>${item.title}</td></tr>`if (item.rate > 5) tr = tr.replace('<tr', '<tr class="warning"')innerHtml += tr})currentPageText.innerText = `当前页:${currentPage}`results.innerHTML = innerHtml}const postPage = () => {// 锁频率步长内取消请求const newTime = new Date().getTime()const shoudBack = newTime - currentTime < loopInterval * 1000if(shoudBack) {alert(loopInterval + '秒内不要多次点击哦。')return}currentTime = newTimexhr.onreadystatechange = function() {if(this.readyState === 4 && this.status === 200) {const res = JSON.parse(this.response)if(res.length < 1) {alert('今天结束的已经筛选完了')return}allItems = [...allItems, ...res]allItems.sort((a, b) => b.rate - a.rate)reFullTBody(allItems)currentPage--}}xhr.open('post', '/table')xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");//发送请求xhr.send("page=" + currentPage)}xhr.onreadystatechange = function() {if(this.readyState === 4 && this.status === 200) {currentPage = JSON.parse(this.response).pagespostPage()}}xhr.open('get', '/total')xhr.send()</script>
</body></html>
长这个样子:
我多人性化,可以点击跳转、概率超过 5% 红色展示、还告诉你当前所在页码、点太快还给你提示………………………………
就是这么好用,喜欢的赶紧体验吧!
线上:点我体验
Github: Spider
觉得有用,不要吝惜 star 哦。