模拟网页行为之实践篇

无论是模拟网页点击还是直接协议发包，都有其适用的环境。不同的需求选择不同的方案。如果只是简单的获取类似网页IP地址的需求，实际上协议发包是最简单的。但如果是用户名网页登陆等稍微复杂的登陆要求，则直接填写表单，并获取按钮元素来模拟点击这个方案来讲相对简单。但需求若稍微再变化一点，要求效率多线程，这个时候又是协议发包会作为首选，哪怕需要用户名网页登陆。没有最好的方案只有更合适的方案。这点很重要。

先说模拟网页点击方式，一般我采用的是继承MFC里面的CDHtmlDialog类自己命名为CWebLoginDlg，选择MFC并不是说有多好用，而是本人对于MFC使用更熟悉，学习成本更低，仅此而已。

模拟网页逻辑就是个状态机的处理，而状态的获取，一个可以通过获取网页源码判断特征字符串的方式，还有个可以通过XMLHttpRequest的回包判断数据的方式，再有也可以通过获取网页元素com组件对象是否存在的方式。

先说获取网页源码，基本原理是先获取网页DOC对象，然后遍历DOC里面的元素，找到TAG为html的元素 get_outerHTML，c++代码如下：

std::wstring CWebLoginDlg::GetHtmlSource()
{CComPtr<IHTMLDocument2> sphtmlDoc;CComPtr<IHTMLElementCollection> pIHTMLElementCollect;GetDHtmlDocument(&sphtmlDoc);if (!sphtmlDoc){return L"";}sphtmlDoc->get_all(&pIHTMLElementCollect);long iCount;pIHTMLElementCollect->get_length(&iCount);CComBSTR data;for (int i = 0; i < iCount; i++){CComVariant v3,v4;v3=(long)i;v4.vt=VT_I4;v4=(long)0;CComPtr<IDispatch> pDisp;HRESULT hr = pIHTMLElementCollect->item(v3,v4,&pDisp);if (!SUCCEEDED(hr)){continue;}CComQIPtr<IHTMLElement, &IID_IHTMLElement> pHTMLElement(pDisp);if(pHTMLElement == NULL){continue;}CComBSTR strTagName;hr = pHTMLElement->get_tagName(&strTagName);if(!SUCCEEDED(hr)){continue;}CString strTag=strTagName;strTag.MakeLower();if(strTag!="html"){continue;}hr = pHTMLElement->get_outerHTML(&data);if(!SUCCEEDED(hr)){return L"";}break;}std::wstring source = data.m_str;return source;
}

再说XMLHttpRequest， XMLHttpRequest是对ajax技术的实现，重点体现在包头的Content-Type字段，这个字段值为application/x-www-form-urlencoded，例如网页代码：

function syncGameInfoAgent () {$.ajax({url : "/api/web/syncGameInfoAgent",type : "GET",cache : false,async : false,success : function(data, textStatus, jqXHR) {}});}

这种方式的ajax还会多个X-Requested-With字段，值为XMLHttpRequest，回包数据都为json格式，还有一种网页代码：

var loadLoginInfo = function() {$.ajax({url : getDomain() + "rest/user",type : "GET",data : {"fields" : "isNameCheck,charCount,isBlocked,isGameBlocked,extAccountInfo"},cache : false,async : true,dataType : "jsonp",success : function(response) {  ...

这种方式ajax没有X-Requested_With字段，回包数据为网页数据。c++实现方式如下：

MSXML2::IXMLHTTPRequestPtr m_pIXMLHTTPRequest;m_pIXMLHTTPRequest.CreateInstance("Msxml2.XMLHTTP.6.0");std::wstring CWebLoginDlg::XMLHttpRequest( std::wstring url , std::string requestType/* = "GET"*/)
{BSTR bstrString = NULL;HRESULT hr=m_pIXMLHTTPRequest->open(requestType.c_str(), url.c_str(), false);SUCCEEDED(hr) ? 0 : throw hr;m_pIXMLHTTPRequest->setRequestHeader("X-Requested-With", "XMLHttpRequest"); //这里第二种情况则不能带有此字段m_pIXMLHTTPRequest->setRequestHeader("Content-Type", "application/x-www-form-urlencoded");hr=m_pIXMLHTTPRequest->send();SUCCEEDED(hr) ? 0 : throw hr;bstrString=m_pIXMLHTTPRequest->responseText; //第二种情况则m_pIXMLHTTPRequest->responseBodystd::wstring freePayString = bstrString;if (bstrString){SysFreeString(bstrString);bstrString = NULL;}return freePayString;
}

这里提供一个技巧，找到ajax代码的实现，可以先抓包得到http header里面的request url的url字符串，然后在脚本里面去查找字符串，一般字符串就在ajax代码

url : getDomain() + "rest/user"里面搜索到，搜索到后可以看代码对于返回值的处理，这样才好方便写逻辑。

最后说下获取网页元素，一般来说网页元素都会带有ID或者带有ClassName，例如网页代码：

<input type="password" id="pwd" name="password" class="user_pw" maxlength="16" size="12" autocomplete="off" title="???? ??">

如果想通过ID获取网页对象，简单的可以直接通过GetElement函数来实现，C++代码如下：

IHTMLElement *id = NULL;
this->GetElement(TEXT("id"), &id);

如果想通过ClassName类名来获取网页对象，C++代码如下：

CComQIPtr< IHTMLElement > CWebLoginDlg::GetElementByClassName( std::wstring className )
{CComPtr<IHTMLDocument2> pIHTMLDocument2;GetDHtmlDocument(&pIHTMLDocument2);HRESULT hr;  CComQIPtr< IHTMLElementCollection > spElementCollection;  hr = pIHTMLDocument2->get_all( &spElementCollection ); //取得表单集合  if ( FAILED( hr ) )  {ATLTRACE("获取集合 IHTMLElementCollection 错误");}  long nFormCount=0;hr = spElementCollection->get_length( &nFormCount );  if ( FAILED( hr ) )  {ATLTRACE("获取数目错误");}  IDispatch *pDisp = NULL;CComQIPtr< IHTMLElement > ret = pDisp;for(long i=0; i<nFormCount; i++)  {  pDisp = NULL;hr = spElementCollection->item( CComVariant( i ), CComVariant(), &pDisp );  if ( FAILED( hr ) )  {continue;}CComQIPtr< IHTMLElement > pElement = pDisp;pDisp->Release();CComBSTR varRet;hr = pElement->get_className(&varRet);if (FAILED(hr)){continue;}if (varRet == NULL){continue;}LPCTSTR lpName = OLE2CT( varRet );if (std::wstring(lpName) == className){ret = pElement;break;}}return ret;
}

以上就是模拟网页中获取网页状态的基本函数，有了这几个函数网页模拟方式的基本框架基本都可以搭建起来了。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/499624.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！