网页html版面分析-- BeauifulSoup(python 文档解析提取)

介绍

BeauifulSoup 是一个可以从HTML或XML 文件中提取数据的python库;它能通过转换器实现惯用的文档导航、查找、修改文档的方式。
BeauifulSoup是一个基于re开发的解析库,可以提供一些强大的解析功能;使用BeauifulSoup 能够提高提取数据的效率与爬虫开发效率。

安装

pip install beautifulsoup4 

使用

1 构建文档树

BeauifulSoup 进行文档解析是基于文档树结构来实现的,而文档树则是由BeauifulSoup 中的四个数据对象构建而成的。

在这里插入图片在这里插入图片描述
描述

from bs4 import BeautifulSouphtml = """<div class="post js_watermark quill-editor" style="background-repeat: repeat; background-image: url(&quot;&quot;); background-size: 940px 222.942px;"><h1 class="title">版面分析——网页HTML解析 BeautifulSoup</h1><div class="group-info"><a href="https://wx.zsxq.com/dweb2/index/group/51112141255244"><span>来自:</span><span class="group-name">AiGC面试宝典</span></a></div><div class="author-info"><div class="author"><img src="https://images.zsxq.com/FpFYmnHpgmz5J4DicXxscPfi3GI2?e=2064038400&amp;token=kIxbL07-8jAj8w1n4s9zv64FuZZNEATmlU_Vm6zD:hS7fTOpUpCI18IU4GweitfivQIU=" alt="用户头像"><span class="nick-name">Just do it!</span></div><span class="date" id="article-date">2024年04月27日 14:30</span></div><div class="ql-snow"><div class="content ql-editor"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=&gt;{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag">&lt;class 'bs4.BeautifulSoup'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag">&lt;head&gt;&lt;title&gt;</span>The Dormouse's story<span class="ql-token hljs-tag">&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Tag'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.NavigableString'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Comment'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=&gt;<span class="ql-token hljs-tag">&lt;html&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;head&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;title&gt;</span></div><div class="ql-code-block">   The Dormouse's story</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/title&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/head&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;body&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="title"&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;b&gt;</span></div><div class="ql-code-block">    The Dormouse's story</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/b&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span></div><div class="ql-code-block">    <span class="ql-token hljs-comment">&lt;!--Elsie--&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ,</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span></div><div class="ql-code-block">    Lacie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   and</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span></div><div class="ql-code-block">    Tillie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   ...</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-tag">&lt;/html&gt;</span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[&lt;b&gt;The Dormouse'</span>s story&lt;/b&gt;]</div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">"&gt;Lacie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts:  and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">"&gt;Tillie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>))  <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block">    </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block">  <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;</div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:&lt;h&gt; &lt;/h&gt;</div><div class="ql-code-block">HTML段落:&lt;p&gt; &lt;/p&gt;</div><div class="ql-code-block">HTML链接:&lt;a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>&gt; this <span class="ql-token hljs-keyword">is</span> a link &lt;/a&gt;</div><div class="ql-code-block">HTML图像:&lt;img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /&gt;</div><div class="ql-code-block">HTML表格:&lt;table&gt; &lt;/table&gt;</div><div class="ql-code-block">HTML列表:&lt;ul&gt; &lt;/ul&gt;</div><div class="ql-code-block">HTML块:&lt;div&gt; &lt;/div&gt;</div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block">    <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">        read = requests.get(url)  </div><div class="ql-code-block">        read.raise_for_status()   </div><div class="ql-code-block">        read.encoding = read.apparent_encoding  </div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> read.text    </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block">    soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block">    </div><div class="ql-code-block">    all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block">        src = img[<span class="ql-token hljs-string">'src'</span>]  </div><div class="ql-code-block">        img_url = src</div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block">        root = <span class="ql-token hljs-string">"F:/Pic/"</span>   </div><div class="ql-code-block">        path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>]  </div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root):  </div><div class="ql-code-block">                os.mkdir(root)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block">                read = requests.get(img_url)</div><div class="ql-code-block">                <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block">                    f.write(read.content)</div><div class="ql-code-block">                    f.close()</div><div class="ql-code-block">                    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block">                <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block">   html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block">   getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div></div><div class="milkdown-preview" style="display: none;"><p><img src="https://article-images.zsxq.com/FsOmOdM3jIkLawUT9z7sEbkMZgpV"></p><p><img src="https://article-images.zsxq.com/FnbQkQK1pNTESbYjScR42_PrYb9E"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、Tag对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head:{soup.head} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.name:{soup.head.name} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.head.attrs:{soup.head.attrs} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.head):{type(soup.head)} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>()</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、Navigable String对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.title.string:{soup.title.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.title.string):{type(soup.title.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、Comment对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.a.string:{soup.a.string} \n"</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup.a.string):{type(soup.a.string)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#5、结构化输出soup对象</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"soup.prettify()=&gt;{soup.prettify()}"</span>)</div></div><p><img src="https://article-images.zsxq.com/FmlPl-0tw4xgHRqKTWm5F2R15YJq"></p><div class="ql-code-block-container"><div class="ql-code-block">type(soup):<span class="ql-token hljs-tag">&lt;class 'bs4.BeautifulSoup'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head:<span class="ql-token hljs-tag">&lt;head&gt;&lt;title&gt;</span>The Dormouse's story<span class="ql-token hljs-tag">&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.name:head</div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.head.attrs:{}</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.head):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Tag'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.title.string:The Dormouse's story</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.title.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.NavigableString'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.a.string:Elsie</div><div class="ql-code-block"><br></div><div class="ql-code-block">type(soup.a.string):<span class="ql-token hljs-tag">&lt;class 'bs4.element.Comment'&gt;</span></div><div class="ql-code-block"><br></div><div class="ql-code-block">soup.prettify()=&gt;<span class="ql-token hljs-tag">&lt;html&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;head&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;title&gt;</span></div><div class="ql-code-block">   The Dormouse's story</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/title&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/head&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;body&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="title"&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;b&gt;</span></div><div class="ql-code-block">    The Dormouse's story</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/b&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   Once upon a time there were three little sisters; and their names were</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/elsie" id="link1"&gt;</span></div><div class="ql-code-block">    <span class="ql-token hljs-comment">&lt;!--Elsie--&gt;</span></div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ,</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/lacie" id="link2"&gt;</span></div><div class="ql-code-block">    Lacie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   and</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;a class="sister" href="http://example.com/tillie" id="link3"&gt;</span></div><div class="ql-code-block">    Tillie</div><div class="ql-code-block">   <span class="ql-token hljs-tag">&lt;/a&gt;</span></div><div class="ql-code-block">   ;</div><div class="ql-code-block">and they lived at the bottom of a well.</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;p class="story"&gt;</span></div><div class="ql-code-block">   ...</div><div class="ql-code-block">  <span class="ql-token hljs-tag">&lt;/p&gt;</span></div><div class="ql-code-block"> <span class="ql-token hljs-tag">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-tag">&lt;/html&gt;</span></div></div><p><br></p><p><img src="https://article-images.zsxq.com/FtPX-qsEEgZYHos3AnyDni1jH6rn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、向下遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.contents)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.children))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-built_in">list</span>(soup.p.descendants))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、向上遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.parent.name,<span class="ql-token hljs-string">'\n'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.p.parents:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3、平行遍历</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_next:'</span>,soup.a.next_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.next_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_nexts:'</span>,i)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previous:'</span>,soup.a.previous_sibling)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.a.previous_siblings:</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'a_previouss:'</span>,i)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/FuyJDzHROhQahkpBUh4jWRuaB-mo"></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[&lt;b&gt;The Dormouse'</span>s story&lt;/b&gt;]</div><div class="ql-code-block">[&lt;b&gt;The Dormouse<span class="ql-token hljs-string">'s story&lt;/b&gt;, "The Dormouse'</span>s story<span class="ql-token hljs-string">"]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">body</span></div><div class="ql-code-block"><span class="ql-token hljs-string">html</span></div><div class="ql-code-block"><span class="ql-token hljs-string">[document]</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_next: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ,</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/lacie<span class="ql-token hljs-string">" id="</span>link2<span class="ql-token hljs-string">"&gt;Lacie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts:  and</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: &lt;a class="</span>siste<span class="ql-token hljs-string">r" href="</span>http://example.com/tillie<span class="ql-token hljs-string">" id="</span>link3<span class="ql-token hljs-string">"&gt;Tillie&lt;/a&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_nexts: ;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.</span></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previous: Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-string">a_previouss: Once upon a time there were three little sisters; and their names were</span></div></div><p><img src="https://article-images.zsxq.com/FtqWdNWSM0b8quez92lJ9SqPTK76"></p><p><span style="background-color: rgb(240, 240, 240); color: rgb(92, 92, 92);">代码</span></p><p><br></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、find_all( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>))  <span class="ql-token hljs-comment">#检索标签名</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link1'</span>)) <span class="ql-token hljs-comment">#检索属性值</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(<span class="ql-token hljs-string">'a'</span>,class_=<span class="ql-token hljs-string">'sister'</span>)) </div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#2、find( )</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.find(<span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">'link2'</span>))</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-comment">#3 、向上检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.p.find_parent().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_parents():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block">    </div><div class="ql-code-block"><span class="ql-token hljs-comment">#4、平行检索</span></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.head.find_next_sibling().name)</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.head.find_next_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(soup.title.find_previous_sibling())</div><div class="ql-code-block"><span class="ql-token hljs-keyword">for</span> i <span class="ql-token hljs-keyword">in</span> soup.title.find_previous_siblings():</div><div class="ql-code-block">    <span class="ql-token hljs-built_in">print</span>(i.name)</div></div><p><img src="https://article-images.zsxq.com/FgdDcWod8Suvbq5UuGYLvXz0UI8R"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">[&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">F:\AwesomeRAG\tutorial\layout_analysis\html\tutorial\BeautifulSoup4\test3.py:<span class="ql-token hljs-number">24</span>: DeprecationWarning: The <span class="ql-token hljs-string">'text'</span> argument to find()-<span class="ql-token hljs-built_in">type</span> methods <span class="ql-token hljs-keyword">is</span> deprecated. Use <span class="ql-token hljs-string">'string'</span> instead.</div><div class="ql-code-block">  <span class="ql-token hljs-built_in">print</span>(soup.find_all(text=[<span class="ql-token hljs-string">'Elsie'</span>,<span class="ql-token hljs-string">'Lacie'</span>]))</div><div class="ql-code-block">[<span class="ql-token hljs-string">'Elsie'</span>, <span class="ql-token hljs-string">'Lacie'</span>]</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;</div><div class="ql-code-block">&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;</div><div class="ql-code-block">body</div><div class="ql-code-block">head</div><div class="ql-code-block">html</div><div class="ql-code-block">[document]</div><div class="ql-code-block">body</div><div class="ql-code-block">body</div><div class="ql-code-block"><span class="ql-token hljs-literal">None</span></div></div><p><img src="https://article-images.zsxq.com/FiqTmlpR_fGE6pUZ8gCcdD9z1ao_"></p><div class="ql-code-block-container"><div class="ql-code-block">HTML标题:&lt;h&gt; &lt;/h&gt;</div><div class="ql-code-block">HTML段落:&lt;p&gt; &lt;/p&gt;</div><div class="ql-code-block">HTML链接:&lt;a href=<span class="ql-token hljs-string">'httts://www.baidu.com/'</span>&gt; this <span class="ql-token hljs-keyword">is</span> a link &lt;/a&gt;</div><div class="ql-code-block">HTML图像:&lt;img src=<span class="ql-token hljs-string">'Ai-code.jpg'</span>,width=<span class="ql-token hljs-string">'104'</span>,height=<span class="ql-token hljs-string">'144'</span> /&gt;</div><div class="ql-code-block">HTML表格:&lt;table&gt; &lt;/table&gt;</div><div class="ql-code-block">HTML列表:&lt;ul&gt; &lt;/ul&gt;</div><div class="ql-code-block">HTML块:&lt;div&gt; &lt;/div&gt;</div></div><p><img src="https://article-images.zsxq.com/FkTgptMBTLt2w7nUUKs13PNKkckn"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><br></div><div class="ql-code-block">html =  <span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;html&gt;&lt;head&gt;&lt;title&gt;The Dormouse's story&lt;/title&gt;&lt;/head&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="title"&gt;&lt;b&gt;The Dormouse's story&lt;/b&gt;&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;Once upon a time there were three little sisters; and their names were</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/elsie" class="sister" id="link1"&gt;&lt;!--Elsie--&gt;&lt;/a&gt;,</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/lacie" class="sister" id="link2"&gt;Lacie&lt;/a&gt; and</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;a href="http://example.com/tillie" class="sister" id="link3"&gt;Tillie&lt;/a&gt;;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">and they lived at the bottom of a well.&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;p class="story"&gt;...&lt;/p&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/body&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">&lt;/html&gt;</span></div><div class="ql-code-block"><span class="ql-token hljs-string">"""</span></div><div class="ql-code-block"><span class="ql-token hljs-comment">#1、BeautifulSoup对象</span></div><div class="ql-code-block">soup = BeautifulSoup(html, <span class="ql-token hljs-string">'lxml'</span>)</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">f"type(soup):{type(soup)} \n"</span>)</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'标签查找:'</span>,soup.select(<span class="ql-token hljs-string">'a'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'属性查找:'</span>,soup.select(<span class="ql-token hljs-string">'a[id="link1"]'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'类名查找:'</span>,soup.select(<span class="ql-token hljs-string">'.sister'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'id查找:'</span>,soup.select(<span class="ql-token hljs-string">'#link1'</span>))</div><div class="ql-code-block"><span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">'组合查找:'</span>,soup.select(<span class="ql-token hljs-string">'p #link1'</span>))</div></div><p><img src="https://article-images.zsxq.com/FjQkiig9fOl0Bd5qiCbyH4OddW50"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-built_in">type</span>(soup):&lt;<span class="ql-token hljs-keyword">class</span> <span class="ql-token hljs-string">'bs4.BeautifulSoup'</span>&gt;</div><div class="ql-code-block"><br></div><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><img src="https://article-images.zsxq.com/FqZck0in441U4EYGi6KobKlS0emA"></p><div class="ql-code-block-container"><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> requests</div><div class="ql-code-block"><span class="ql-token hljs-keyword">from</span> bs4 <span class="ql-token hljs-keyword">import</span> BeautifulSoup</div><div class="ql-code-block"><span class="ql-token hljs-keyword">import</span> os</div><div class="ql-code-block"><br></div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getUrl</span>(<span class="ql-token hljs-params">url</span>):</div><div class="ql-code-block">    <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">        read = requests.get(url)  </div><div class="ql-code-block">        read.raise_for_status()   </div><div class="ql-code-block">        read.encoding = read.apparent_encoding  </div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> read.text    </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">return</span> <span class="ql-token hljs-string">"连接失败!"</span></div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">def</span> <span class="ql-token hljs-title">getPic</span>(<span class="ql-token hljs-params">html</span>):</div><div class="ql-code-block">    soup = BeautifulSoup(html, <span class="ql-token hljs-string">"html.parser"</span>)</div><div class="ql-code-block">    </div><div class="ql-code-block">    all_img = soup.find(<span class="ql-token hljs-string">'ul'</span>).find_all(<span class="ql-token hljs-string">'img'</span>) </div><div class="ql-code-block">    <span class="ql-token hljs-keyword">for</span> img <span class="ql-token hljs-keyword">in</span> all_img:</div><div class="ql-code-block">        src = img[<span class="ql-token hljs-string">'src'</span>]  </div><div class="ql-code-block">        img_url = src</div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(img_url)</div><div class="ql-code-block">        root = <span class="ql-token hljs-string">"F:/Pic/"</span>   </div><div class="ql-code-block">        path = root + img_url.split(<span class="ql-token hljs-string">'/'</span>)[-<span class="ql-token hljs-number">1</span>]  </div><div class="ql-code-block">        <span class="ql-token hljs-built_in">print</span>(path)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">try</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(root):  </div><div class="ql-code-block">                os.mkdir(root)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">if</span> <span class="ql-token hljs-keyword">not</span> os.path.exists(path):</div><div class="ql-code-block">                read = requests.get(img_url)</div><div class="ql-code-block">                <span class="ql-token hljs-keyword">with</span> <span class="ql-token hljs-built_in">open</span>(path, <span class="ql-token hljs-string">"wb"</span>)<span class="ql-token hljs-keyword">as</span> f:</div><div class="ql-code-block">                    f.write(read.content)</div><div class="ql-code-block">                    f.close()</div><div class="ql-code-block">                    <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件保存成功!"</span>)</div><div class="ql-code-block">            <span class="ql-token hljs-keyword">else</span>:</div><div class="ql-code-block">                <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件已存在!"</span>)</div><div class="ql-code-block">        <span class="ql-token hljs-keyword">except</span>:</div><div class="ql-code-block">            <span class="ql-token hljs-built_in">print</span>(<span class="ql-token hljs-string">"文件爬取失败!"</span>)</div><div class="ql-code-block"> </div><div class="ql-code-block"><span class="ql-token hljs-keyword">if</span> __name__ == <span class="ql-token hljs-string">'__main__'</span>:</div><div class="ql-code-block">   html_url=getUrl(<span class="ql-token hljs-string">"https://findicons.com/search/nature"</span>)</div><div class="ql-code-block">   getPic(html_url)</div></div><p><br></p><p><img src="https://article-images.zsxq.com/Fh_dDSbuteEI_0ArnWoZrCFDRuvm"></p><div class="ql-code-block-container"><div class="ql-code-block">标签查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block">属性查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">类名查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/lacie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link2"</span>&gt;Lacie&lt;/a&gt;, &lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/tillie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link3"</span>&gt;Tillie&lt;/a&gt;]</div><div class="ql-code-block"><span class="ql-token hljs-built_in">id</span>查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div><div class="ql-code-block">组合查找: [&lt;a <span class="ql-token hljs-keyword">class</span>=<span class="ql-token hljs-string">"sister"</span> href=<span class="ql-token hljs-string">"http://example.com/elsie"</span> <span class="ql-token hljs-built_in">id</span>=<span class="ql-token hljs-string">"link1"</span>&gt;&lt;!--Elsie--&gt;&lt;/a&gt;]</div></div><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p><p><br></p></div><footer><div class="horizon-line"></div><img id="logo" src="/assets_dweb/logo@1x.png"><div class="text">知识星球</div><div class="horizon-line"></div></footer><div class="qrcode-container"><img class="qrcode" id="qrcode" src=""><div class="text-desc">扫码加入星球</div><div class="text-desc">查看更多优质内容</div></div><div id="qrcode-url">https://wx.zsxq.com/mweb/views/joingroup/join_group.html?group_id=51112141255244</div><input type="hidden" name="group_allow_copy" value="false"><input type="hidden" name="group_enable_watermark" value="true"><input type="hidden" name="member_id" value="111888182154422"><input type="hidden" name="member_name" value="wws"><input type="hidden" name="member_role" value="other"></div>
"""
## 上面html跟下面的结果对不上,但是不影响理解应该,跑的时候换成自己的html跑一下就知道了
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. Tag 对象
print(f"soup.head:{soup.head} \n")
print(f"soup.head.name:{soup.head.name} \n")
print(f"soup.head.attrs:{soup.head.attrs} \n")
print(f"type(soup.head):{type(soup.head)} \n")# 3. Navigable String 对象
print(f"soup.title.stringh:{soup.title.string} \n")
print(f"type(soup.title.string):{type(soup.title.string)} \n")# 4. Comment 对象
print(f"soup.a.string:{soup.a.string} \n")
print(f"type(soup.a.string):")# 5. 结构化输出soup对象
print(f"soup.prettify()=>{soup.prettify()}")

在这里插入图片描述

2. 遍历文档树

BeautifulSoup 之所以将文档转为树结构,是因为树结构更便于对内容遍历提取
在这里插入图片描述

from bs4 import BeautifulSouphtml = “”“
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
”“”
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. 向下遍历
print(soup.p.contents)
print(list(soup.p.children))
print(list(soup.p.descendants))# 3. 向上遍历
print(list(soup.p.parent.name))
for i in soup.p.parents:print(i.name)# 4. 平行遍历
print('a_next:',soup.a.next_sibling)
for i in soup.a.next_sibling:print('a_nexts:', i)
print('a_previous:',soup.a.previous_sibling )
for i in soup.a.previous_sibling:print('a_previous:', i)

在这里插入图片描述

4 搜索文档树

搜索方法:
在这里插入图片描述

from bs4 import BeautifulSouphtml = “”“
<html><head><title>The Dotmouse's stroy</title></head>
<body>
<p ...>...</p>
...
</body>
</html>
”“”
# 1. BeautifulSoup对象
soup = BeautifulSoup(html, 'lxml')
print(f"type(soup):{type(soup)}\n")# 2. find()
print(soup.find('a'))# 查找a标签
print(soup.find(id='link2'))# 查找id等于link2的元素# 3. find_all()
print(soup.find_all('a'))# 查找标签名
print(soup.find_all('a',id='link1'))# 检索属性值
print(soup.find_all('a',class='sister'))# 检索属性值
print(soup.find_all(text=['Elsie','Lacie']))# 4. 向上检索
print(list(soup.p.find_parent().name))
for i in soup.title.find_parents():print(i.name)# 5. 平行检索
print(soup.head.find_next_sibling().name())
for i in soup.head.find_next_sibling():print('a_nexts:', i)
print(soup.title.find_previous_sibling())
for i in soup.title.find_previous_sibling():print('a_previous:', i)

在这里插入图片描述

5 CSS 选择器

在Tag或者BeautifulSoup对象的select()方法中传入字符串参数,即可使用CSS选择器找到Tag
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

6 爬取图片示例
import requests
from bs4 import BeautifulSoup
import os
def geturl(url):try :read =requests.get(url)read.raise for status()read,encoding=read.apparent encodingreturn read.textexcept:return“连接失败!def getPic(html):soup= BeautifulSoup(html, "html.parser”)all_img = soup.find('ul’).find_all( img )for img in all_img:src = img['src’]img url = srcprint(img_url)root ='F:/Pic/'path=root + img_url.split(/)[-1]print(path)try:if not os.path.exists(root):os.mkdir (root)if not os.path.exists(path):read =requests.get(img url)with open(path, “wb )as f:f.write(read.content)f.close()print("文件保存成功!")else :print(“文件已存在!")except:print(~文件爬取失败!")
if __name__=='__main__':html_url=getUrl( 'https://findicons.com/search/nature' )getPic(html_url)

在这里插入图片描述

参考
版面分析–网页HTML解析
Beautiful Soup 4.4.0 文档
python爬虫之Beautifulsoup模块用法详解
网络爬虫之BeautifulSoup详解(含多个案例)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/832573.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

VueComponent构造函数

//创建school组件——注册给谁 在谁的结构上写const school Vue.extend({name: school,//开发者工具的显示template: <div><h2>学校名称&#xff1a;{{schoolName}}</h2><h2>学校地址&#xff1a;{{adress}}</h2> </div>,//结构data() {…

液晶数显式液压万能试验机WES-300B

一、简介 主机为两立柱、两丝杠、油缸下置式&#xff0c;拉伸空间位于主机的上方&#xff0c;压缩、弯曲试验空间位于主机下横梁和工作台之间。测力仪表采用高清液晶显示屏&#xff0c;实验数据方便直观。 主要性能技术指标 最大试验力&#xff08;kN&#xff09; 300 试…

FreeRTOS资源管理

1.以前临界资源的保护方式 有使用过静态局部变量来保护临界资源&#xff0c;也有用队列&#xff0c;信号量&#xff0c;互斥量来保护临界资源。这些都是在多个任务会共同使用临界资源的情况下我们的保护方式。 问题提出&#xff1a;如果有个传感器在读取数据时有严格的时序&a…

前端基础学习html(1)

1.标题标签.h1,h2...h6 2.段落标签p 换行标签br 3.加粗strong(b) /倾斜em(i) /删除 del(s) /下划线ins(u) 4.盒子&#xff1a;div //一行一个 span//一行多个 5.img :src alt title width height border 图片src引用&#xff1a;相对路径 上级/同级/中级 绝对路径&#xff…

触动精灵纯本地离线文字识别插件

目的 触动精灵是一款可以模拟鼠标和键盘操作的自动化工具。它可以帮助用户自动完成一些重复的、繁琐的任务&#xff0c;节省大量人工操作的时间。但触动精灵的图色功能比较单一&#xff0c;无法识别屏幕上的图像&#xff0c;根据图像的变化自动执行相应的操作。本篇文章主要讲解…

【Java基础】三大特性——多态

多态的前提条件&#xff1a;继承可以简单理解为&#xff1a;把子类看成父类类型&#xff08;反之是错误的&#xff09; 优缺点 弊端: 只能使用父类&#xff08;父接口&#xff09;中定义的功能好处: 函数的参数定义为父类&#xff08;父接口&#xff09;类型&#xff0c;可以…

使用idea编辑器回退git已经push的代码

直接上结果 选择想要回退的那次/多次提交历史, 右击, 选中 revert commit git自动产生一个Revert记录&#xff0c;然后我们会看到git自动将我第三次错误提交代码回退了&#xff0c;这个其实就相当于git帮我们手动回退了代码。 后续&#xff0c;只需要我们将本次改动push到远…

Vue 介绍

【1】前端发展史 前端的发展史可简述为&#xff1a; 从最初的静态页面编写&#xff0c;依赖后端模板渲染逐步演化为通过JavaScript&#xff08;特别是Ajax技术&#xff09;实现前后端分离&#xff0c;使得前端能够独立地加载数据和渲染页面随后&#xff0c;Angular、React、Vu…

open 函数到底做了什么

使用设备之前我们通常都需要调用 open 函数&#xff0c;这个函数一般用于设备专有数据的初始化&#xff0c;申请相关资源及进行设备的初始化等工作&#xff0c;对于简单的设备而言&#xff0c;open 函数可以不做具体的工作&#xff0c;你在应用层通过系统调用 open 打开设备…

2024年电化学、可再生能源与绿色发展国际会议(ICERGD2024)

2024年电化学、可再生能源与绿色发展国际会议(ICERGD2024) 会议简介 2024国际电化学、可再生能源与绿色发展大会&#xff08;ICERGD2024&#xff09;将在青岛隆重举行。本次会议聚焦电化学、可再生能源和绿色发展领域的最新研究成果和技术趋势&#xff0c;旨在促进相关领域…

OpenNJet:下一代云原生应用引擎

OpenNJet&#xff1a;下一代云原生应用引擎 前言一、技术架构二、新增特性1. 透明流量劫持2. 熔断机制3. 遥测与故障注入 三、Ubuntu 发行版安装 OpentNJet1. 添加gpg 文件2. 添加APT 源3. 安装及启动4. 验证 总结 前言 OpenNJet&#xff0c;是一款基于强大的 NGINX 技术栈构建…

typescript类型基础

typescript类型基础 枚举类型 enum Season {Spring,Summer,Fall,Winter }数值型枚举 enum Direction {Up,Down,Left,Right } const direction:Direction Direction.up每个数值型枚举成员都表示一个具体的数字&#xff0c;如果在定义一个枚举的时候没有设置枚举成员的值&…

Excel利用数据透视表将二维数据转换为一维数据(便于后面的可视化分析)

一维数据&#xff1a;属性值都不可合并&#xff0c;属性值一般在第一列或第一行。 二维数据&#xff1a;行属性或列属性是可以继续合并的&#xff0c;如下数据中行属性可以合并为【月份】 下面利用数据透视表将二维数据转换为一维数据&#xff1a; 1、在原来的数据上插入数据透…

MySQL 社区经理:MySQL 8.4 InnoDB 参数默认值为什么要这么改?

MySQL 8.4 LTS 版本&#xff0c;我们一共修改了 20 个 InnoDB 变量的默认值。 作者&#xff1a;Frederic Descamps&#xff0c;EMEA 和亚太地区的 MySQL 社区经理。于 2016 年 5 月加入 MySQL 社区团队。担任开源和 MySQL 顾问已超过 15 年。最喜欢的主题是高可用和高性能。 本…

解决一个朋友的nbcio-boot的mysql数据库问题

1、原先安装mysql5.7数据库&#xff0c;导入我的项目里的带数据有报错信息 原因不明 2、只能建议用docker进行msyql5.7的安装 如下&#xff0c;可以修改成自己需要的信息 docker run -p 3306:3306 --name mastermysql -v /home/mydata/mysql/data:/var/lib/mysql -e MYSQL_R…

python绘图(pandas)

matplotlib绘图 import pandas as pd abs_path rF:\Python\learn\python附件\pythonCsv\data.csv df pd.read_csv(abs_path, encodinggbk) # apply根据多列生成新的一个列的操作&#xff0c;用apply df[new_score] df.apply(lambda x : x.数学 x.语文, axis1)# 最后几行 …

c#word文档:1.创建空白Word文档及保存/2.添加页内容...

---创建空白Word文档 --- &#xff08;1&#xff09;创建一个名为OfficeOperator的类库项目。引用操作Word的.NET类库 &#xff08;2&#xff09;定义用于操作Word的类WordOperator1。添加引用Microsoft.Office.Interop.Word命名空间。 &#xff08;3&#xff09;为WordOper…

Unity | Shader基础知识(第十三集:编写内置着色器阶段总结和表面着色器的补充介绍)

目录 前言 一、表面着色器的补充介绍 二、案例viewDir详解 1.viewDir是什么 2.viewDir的作用 3.使用viewDir写shader 前言 注意观察的小伙伴会发现&#xff0c;这组教程前半部分我们在编写着色器的时候&#xff0c;用的是顶点着色器和片元着色器的组合。 SubShader{CGPRO…

个股期权是什么,个股期权使用方法?

今天期权懂带你了解个股期权是什么,个股期权使用方法&#xff1f;个股期权作为金融市场的重要工具之一&#xff0c;是指投资者在约定时间内有权而非义务以约定价格买卖特定数量的个股的金融衍生品。 个股期权是什么&#xff1f; 个股期权合约是一种由交易所统一设定的标准化合…

git-新增业务代码分支

需求 使用git作为项目管理工具管理项目&#xff0c;我需要有两个分支&#xff0c;一个分支是日常的主分支&#xff0c;会频繁的推送和修改代码并推送另外一个是新的业务代码分支&#xff0c;是一个长期开发的功能&#xff0c;同时这个业务分支需要频繁的拉取主分支的代码&#…