BeautifulSoup模塊的簡單使用

y3c5 10年前發布 | 957 次閱讀 Python

可以通過dir(BeautifulSoup.BeautifulSoup)查看其有什么函數，如果想知道某個函數的含義可以使用help(BeautifulSoup.BeautifulSoup.find)來查看其官方文檔。

可以使用pprint來整輸出，使用dir和help之前一定要import BeautifulSoup。

    # -- coding:utf8 --

    import urllib

    import urllib2

    import BeautifulSoup

    import re

htmlSource = urllib.urlopen("http://www.taobao.com/").read(200000)  
soup = BeautifulSoup.BeautifulSoup(htmlSource)  

#輸出<head>...</head>  
print soup.head  

#輸出<title>...</title>  
print soup.head.title  

#會返回一個列表，每個列表元素都是<a>...</a>   
tags = soup.findAll('a')  
print tags  

print '京東放養的爬蟲'  

#取<a></a>中間包含的元素，如果有href則輸出  
for item in soup.fetch('a',href=True):  
    print item['href']  

#找到所有的<a></a>,如果其中href元素中含有taobao則輸出  
for a in soup.findAll('a',href=True):  
    if re.findall('taobao', a['href']):  
        print "Found the URL:", a['href']  

#輸出<div></div>中間class屬性等于J_Tanx mod，只輸出第一個  
print str(soup.find("div",{"class":"J_Tanx mod"}))  </pre>

本文由用戶 y3c5 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/code/view/1431334028247

Python

BeautifulSoup模塊的簡單使用

相關代碼

相關文檔

相關經驗

目錄