PHP 爬蟲庫:Goutte
Goutte 是一個抓取網站數據的 PHP 庫。它提供了一個優雅的 API,這使得從遠程頁面上選擇特定元素變得簡單。
Require the Goutte phar file to use Goutte in a script:
require_once '/path/to/goutte.phar';
Create a Goutte Client instance (which extends SymfonyComponentBrowserKitClient):
use Goutte\Client; $client = new Client();
Make requests with the request() method:
$crawler = $client->request('GET', 'http://www.symfony-project.org/');
The method returns a Crawler object (SymfonyComponentDomCrawlerCrawler).
點擊鏈接:
$link = $crawler->selectLink('Plugins')->link(); $crawler = $client->click($link);
提交表單:
$form = $crawler->selectButton('sign in')->form(); $crawler = $client->submit($form, array('signin[username]' => 'fabien', 'signin[password]' => 'xxxxxx'));抽取數據:
$nodes = $crawler->filter('.error_list'); if ($nodes->count()) { die(sprintf("Authentication error: %s\n", $nodes->text())); } printf("Nb tasks: %d\n", $crawler->filter('#nb_tasks')->text());
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!