python scrapy 網絡采集使用代理的方法
1.在Scrapy工程下新建“middlewares.py”
Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
import base64
Start your middleware class
class ProxyMiddleware(object):
# overwrite process request def process_request(self, request, spider): # Set the location of the proxy request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT" # Use the following lines if your proxy requires authentication proxy_user_pass = "USERNAME:PASSWORD" # setup basic authentication for the proxy encoded_user_pass = base64.encodestring(proxy_user_pass) request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass該代碼片段來自于: http://www.sharejs.com/codes/python/8309</pre> 2.在項目配置文件里(./project_name/settings.py)添加
DOWNLOADER_MIDDLEWARES = { 'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110, 'project_name.middlewares.ProxyMiddleware': 100, }</pre> 只要兩步,現在請求就是通過代理的了。測試一下^_^
from scrapy.spider import BaseSpider from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.http import Requestclass TestSpider(CrawlSpider): name = "test" domain_name = "whatismyip.com"
# The following url is subject to change, you can get the last updated one from here : # http://www.whatismyip.com/faq/automation.asp start_urls = ["http://xujian.info"] def parse(self, response): open('test.html', 'wb').write(response.body)</pre>
 本文由用戶 mn6e  自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
                         轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
                         本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!