python爬蟲之Scrapy 使用代理配置

nf456 11年前發布 | 32K 次閱讀 Scrapy 網絡爬蟲

在爬取網站內容的時候，最常遇到的問題是：網站對IP有限制，會有防抓取功能，最好的辦法就是IP輪換抓取（加代理）下面來說一下Scrapy如何配

在爬取網站內容的時候，最常遇到的問題是：網站對IP有限制，會有防抓取功能，最好的辦法就是IP輪換抓取（加代理）

下面來說一下Scrapy如何配置代理，進行抓取

1.在Scrapy工程下新建“middlewares.py”

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication import base64

Start your middleware class

class ProxyMiddleware(object): # overwrite process request def process_request(self, request, spider): # Set the location of the proxy request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT" # Use the following lines if your proxy requires authentication proxy_user_pass = "USERNAME:PASSWORD" # setup basic authentication for the proxy encoded_user_pass = base64.encodestring(proxy_user_pass) request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass</pre>

2.在項目配置文件里(./pythontab/settings.py)添加

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    'pythontab.middlewares.ProxyMiddleware': 100,
}

</div>

本文由用戶 nf456 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1420379713250.html

Scrapy 網絡爬蟲

python爬蟲之Scrapy 使用代理配置

相關經驗

相關資訊

相關文檔

目錄