一個可配置的,可擴展的PHP網頁蜘蛛:PHP-Spider

jopen 10年前發布 | 15K 次閱讀 網絡爬蟲 PHP-Spider

PHP-Spider是一個可配置的,可擴展的PHP網頁蜘蛛。

PHP-Spider Features

  • supports two traversal algorithms: breadth-first and depth-first
  • supports depth limiting and queue size limiting
  • supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
  • comes with a useful set of URI filters, such as Domain limiting
  • supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
  • supports custom request handling logic
  • comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
  • supports custom persistence handlers
  • collects statistics about the crawl for reporting
  • dispatches useful events, allowing developers to add even more custom behavior
  • supports a politeness policy
  • will soon come with many default discoverers: RSS, Atom, RDF, etc.
  • will soon support multiple queueing mechanisms (file, memcache, redis)
  • will eventually support distributed spidering with a central queue
  • </ul>

    項目主頁:http://www.baiduhome.net/lib/view/home/1399025018796

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!