一個可配置的,可擴展的PHP網頁蜘蛛:PHP-Spider
PHP-Spider是一個可配置的,可擴展的PHP網頁蜘蛛。
PHP-Spider Features
- supports two traversal algorithms: breadth-first and depth-first
- supports depth limiting and queue size limiting
- supports adding custom URI discovery logic, based on XPath, CSS selectors, or plain old PHP
- comes with a useful set of URI filters, such as Domain limiting
- supports custom URI filters, both prefetch (URI) and postfetch (Resource content)
- supports custom request handling logic
- comes with a useful set of persistence handlers (memory, file. Redis soon to follow)
- supports custom persistence handlers
- collects statistics about the crawl for reporting
- dispatches useful events, allowing developers to add even more custom behavior
- supports a politeness policy
- will soon come with many default discoverers: RSS, Atom, RDF, etc.
- will soon support multiple queueing mechanisms (file, memcache, redis)
- will eventually support distributed spidering with a central queue </ul>
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!