python scrapy 網絡采集使用代理的方法

mn6e 9年前發布 | 2K 次閱讀 Python

1.在Scrapy工程下新建“middlewares.py”

Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication

import base64

Start your middleware class

class ProxyMiddleware(object):

# overwrite process request
def process_request(self, request, spider):
    # Set the location of the proxy
    request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"

    # Use the following lines if your proxy requires authentication
    proxy_user_pass = "USERNAME:PASSWORD"
    # setup basic authentication for the proxy
    encoded_user_pass = base64.encodestring(proxy_user_pass)
    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass


該代碼片段來自于: http://www.sharejs.com/codes/python/8309</pre> 2.在項目配置文件里(./project_name/settings.py)添加

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    'project_name.middlewares.ProxyMiddleware': 100,
}

</pre> 只要兩步,現在請求就是通過代理的了。測試一下^_^

from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.http import Request

class TestSpider(CrawlSpider): name = "test" domain_name = "whatismyip.com"

# The following url is subject to change, you can get the last updated one from here :
# http://www.whatismyip.com/faq/automation.asp
start_urls = ["http://xujian.info"]

def parse(self, response):
    open('test.html', 'wb').write(response.body)

</pre>

 本文由用戶 mn6e 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!