php curl經典最常用的5個例子
php curl常用的5個例子我用php ,curl主要是抓取數據,當然我們可以用其他的方法來抓取,比如fsockopen,file_get_contents等。但是只能抓那些能直接訪問的頁面,如果要抓取有頁面訪問控制的頁面,或者是登錄以后的頁面就比較困難了。
1,抓取無訪問控制文件
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://localhost/mytest/phpinfo.php"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //如果把這行注釋掉的話,就會直接輸出 $result=curl_exec($ch); curl_close($ch); ?>2,使用代理進行抓取
為什么要使用代理進行抓取呢?以google為例吧,如果去抓google的數據,短時間內抓的很頻繁的話,你就抓取不到了。google對你的ip地址做限制這個時候,你可以換代理重新抓。
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://blog.51yip.com"); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密碼的話,加上這個 $result=curl_exec($ch); curl_close($ch); ?>3,post數據后,抓取數據
單獨說一下數據提交數據,因為用 curl的時候,很多時候會有數據交互的,所以比較重要的。
<?php $ch = curl_init(); /*在這里需要注意的是,要提交的數據不能是二維數組或者更高 *例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') *例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')這樣會報錯的*/ $data = array('name' => 'test', 'sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL, 'http://localhost/mytest/curl/upload.php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_exec($ch); ?>在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php輸出的內容Array ( [name] => test [sex] => 1 [birth] => 20101010 )
4,抓取一些有頁面訪問控制的頁面
以前寫過一篇,頁面訪問控制的3種方法有興趣的可以看一下。
如果用上面提到的方法抓的話,會報以下錯誤
You are not authorized to view this page You do not have permission to view this directory or page using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.
這個時候,我們就要用CURLOPT_USERPWD來進行驗證了
<?php $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://club-china"); /*CURLOPT_USERPWD主要用來破解頁面訪問控制的 *例如平時我們所以htpasswd產生頁面控制等。*/ //curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd='); curl_setopt($ch, CURLOPT_HTTPGET, 1); curl_setopt($ch, CURLOPT_REFERER, "http://club-china"); curl_setopt($ch, CURLOPT_HEADER, 0); $result=curl_exec($ch); curl_close($ch); ?>5,模擬登錄到sina
我們要抓取數據,可能是登錄以后的內容,這個時候我們就要用到curl的模擬登錄功能了。 <?php
function checklogin( $user, $password ) { if ( emptyempty( $user ) || emptyempty( $password ) ) { return 0; } $ch = curl_init( ); curl_setopt( $ch, CURLOPT_REFERER, "http://mail.sina.com.cn/index.html" ); curl_setopt( $ch, CURLOPT_HEADER, true ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_USERAGENT, USERAGENT ); curl_setopt( $ch, CURLOPT_COOKIEJAR, COOKIEJAR ); curl_setopt( $ch, CURLOPT_TIMEOUT, TIMEOUT ); curl_setopt( $ch, CURLOPT_URL, "http://mail.sina.com.cn/cgi-bin/login.cgi" ); curl_setopt( $ch, CURLOPT_POST, true ); curl_setopt( $ch, CURLOPT_POSTFIELDS, "&logintype=uid&u=".urlencode( $user )."&psw=".$password ); $contents = curl_exec( $ch ); curl_close( $ch ); if ( !preg_match( "/Location: (.*)\\\\/cgi\\\\/index\\\\.php\\\\?check_time=(.*)\\n/", $contents, $matches ) ) { return 0; }else{ return 1; } } define( "USERAGENT", $_SERVER['HTTP_USER_AGENT'] ); define( "COOKIEJAR", tempnam( "/tmp", "cookie" ) ); define( "TIMEOUT", 500 ); echo checklogin("zhangying215","xtaj227"); ?>打開/tmp下面的cookie文件看一下
Netscape HTTP Cookie File
http://curl.haxx.se/rfc/cookie_spec.html
This file was generated by libcurl! Edit at your own risk.
mail.sina.com.cn FALSE / FALSE 0 SINAMAIL-WEBFACE-SESSID 65223c4bd8900284ed463d2a3e1ac182
HttpOnly_.sina.com.cn TRUE / FALSE 0 SUE es%3D8d96db0820c6c79922ad57d422f575e8%26ev%3Dv0%26es2%3Dcddfb8400dc5ca95902367ddcd7f57dd
.sina.com.cn TRUE / FALSE 0 SUP cv%3D1%26bt%3D1286900433%26et%3D1286986833%26lt%3D1%26uid%3D1445632344%26user%3D%25E5%25BC%25A0%25E6%2598%25A02001%26ag%3D2%26name%3Dzhangying20015%2540sina.com%26nick%3D%25E5%25BC%25A0%25E6%2598%25A02001%26sex%3D1%26ps%3D0%26email%3Dzhangying20015%2540sina.com%26dob%3D1982-07-18
HttpOnly_.sina.com.cn TRUE / FALSE 0 SID BihcallomxMx-QZxzGrOlcSQx%2F0B%2F0cmr.NyQ%2F0B%2FcmGGalmarlmcHrcGlSmrmxmfxal_CBZ%2F_afugCmmGirBYHm0Bc%40fr5ciZiGG5i
HttpOnly_.sina.com.cn TRUE / FALSE 0 SPRIAL bfb4102951fd5892a3fd5b42d442cd26
HttpOnly_.sina.com.cn TRUE / FALSE 0 SINA_USER %D5%C5%D2001
</pre>