Spidr : Ruby開發的Web爬蟲

jopen 13年前發布 | 44K 次閱讀爬蟲網絡爬蟲

Spidr : Ruby開發的Web爬蟲
Spidr是一個多功能的Ruby web 爬蟲庫。它可以抓取一個網站，多個域名或某些鏈接。Spidr被設計成快速和容易使用。

Follows:
- a tags.
- iframe tags.
- frame tags.
- Cookie protected links.
- HTTP 300, 301, 302, 303 and 307 Redirects.
- HTTP Basic Auth protected links.
Black-list or white-list URLs based upon:
- URL scheme
- Host name
- Port number
- Full link
- URL extension
Provides call-backs for:
- Every visited Page.
- Every visited URL.
- Every visited URL that matches a specified pattern.
- Every URL that failed to be visited.
Provides action methods to:
- Pause spidering.
- Skip processing of pages.
- Skip processing of links.
Restore the spidering queue and history from a previous session.
Custom User-Agent strings.
Custom proxy settings.
HTTPS support.

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！