PHP的curl常用的5個例子

jopen 9年前發布 | 4K 次閱讀 PHP

1,抓取無訪問控制文件


<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://localhost/mytest/phpinfo.php&quot;); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//如果把這行注釋掉的話,就會直接輸出 $result=curl_exec($ch); curl_close($ch); ?>

</pre>

2,使用代理進行抓取

為什么要使用代理進行抓取呢?以google為例吧,如果去抓google的數據,短時間內抓的很頻繁的話,你就抓取不到了。google對你的ip地址做限制這個時候,你可以換代理重新抓。


<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://blog.51yip.com&quot;); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密碼的話,加上這個 $result=curl_exec($ch); curl_close($ch); ?>

</pre>

3,post數據后,抓取數據

單獨說一下數據提交數據,因為用 curl的時候,很多時候會有數據交互的,所以比較重要的。


<?php $ch= curl_init(); /在這里需要注意的是,要提交的數據不能是二維數組或者更高 例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010') 例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')這樣會報錯的/ $data=array('name'=>'test','sex'=>1,'birth'=>'20101010'); curl_setopt($ch, CURLOPT_URL,'http://localhost/mytest/curl/upload.php'); curl_setopt($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS,$data); curl_exec($ch); ?>

</pre>

在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php輸出的內容Array ( [name] => test [sex] => 1 [birth] => 20101010 )

4,抓取一些有頁面訪問控制的頁面

以前寫過一篇,頁面訪問控制的3種方法有興趣的可以看一下。

如果用上面提到的方法抓的話,會報以下錯誤

You are not authorized to view this page

Youdonot have permission to view this directoryorpage using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.

這個時候,我們就要用CURLOPT_USERPWD來進行驗證了


<?php $ch= curl_init(); curl_setopt($ch, CURLOPT_URL,"http://club-china&quot;); /CURLOPT_USERPWD主要用來破解頁面訪問控制的 例如平時我們所以htpasswd產生頁面控制等。*/ //curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd='); curl_setopt($ch, CURLOPT_HTTPGET, 1); curl_setopt($ch, CURLOPT_REFERER,"http://club-china&quot;); curl_setopt($ch, CURLOPT_HEADER, 0); $result=curl_exec($ch); curl_close($ch); ?>

</pre>

5,模擬登錄到sina

我們要抓取數據,可能是登錄以后的內容,這個時候我們就要用到curl的模擬登錄功能了。

<?php

functionchecklogin($user,$password) 
 { 
 if( emptyempty($user) || emptyempty($password) ) 
 { 
 return0; 
 } 
 $ch= curl_init( ); 
 curl_setopt($ch, CURLOPT_REFERER,"http://mail.sina.com.cn/index.html"); 
 curl_setopt($ch, CURLOPT_HEADER, true ); 
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true ); 
 curl_setopt($ch, CURLOPT_USERAGENT, USERAGENT ); 
 curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIEJAR ); 
 curl_setopt($ch, CURLOPT_TIMEOUT, TIMEOUT ); 
 curl_setopt($ch, CURLOPT_URL,"http://mail.sina.com.cn/cgi-bin/login.cgi"); 
 curl_setopt($ch, CURLOPT_POST, true ); 
 curl_setopt($ch, CURLOPT_POSTFIELDS,"&logintype=uid&u=".urlencode($user)."&psw=".$password); 
 $contents= curl_exec($ch); 
 curl_close($ch); 
 if( !preg_match("/Location: (.*)\\/cgi\\/index\\.php\\?check_time=(.*)\n/",$contents,$matches) ) 
 { 
 return0; 
 }else{ 
 return1; 
 } 
 }  

 define("USERAGENT",$_SERVER['HTTP_USER_AGENT'] ); 
 define("COOKIEJAR", tempnam("/tmp","cookie") ); 
 define("TIMEOUT", 500 );  

 echochecklogin("zhangying215","xtaj227"); 
 ?> 

打開/tmp下面的cookie文件看一下

Netscape HTTP Cookie File

http://curl.haxx.se/rfc/cookie_spec.html

This file was generated by libcurl! Edit at your own risk.

mail.sina.com.cn FALSE / FALSE 0 SINAMAIL-WEBFACE-SESSID 65223c4bd8900284ed463d2a3e1ac182

HttpOnly_.sina.com.cn TRUE / FALSE 0 SUE es%3D8d96db0820c6c79922ad57d422f575e8%26ev%3Dv0%26es2%3Dcddfb8400dc5ca95902367ddcd7f57dd

.sina.com.cn TRUE / FALSE 0 SUP cv%3D1%26bt%3D1286900433%26et%3D1286986833%26lt%3D1%26uid%3D1445632344%26user%3D%25E5%25BC%25A0%25E6%2598%25A02001%26ag%3D2%26name%3Dzhangying20015%2540sina.com%26nick%3D%25E5%25BC%25A0%25E6%2598%25A02001%26sex%3D1%26ps%3D0%26email%3Dzhangying20015%2540sina.com%26dob%3D1982-07-18

HttpOnly_.sina.com.cn TRUE / FALSE 0 SID BihcallomxMx-QZxzGrOlcSQx%2F0B%2F0cmr.NyQ%2F0B%2FcmGGalmarlmcHrcGlSmrmxmfxal_CBZ%2F_afugCmmGirBYHm0Bc%40fr5ciZiGG5i

HttpOnly_.sina.com.cn TRUE / FALSE 0 SPRIAL bfb4102951fd5892a3fd5b42d442cd26

HttpOnly_.sina.com.cn TRUE / FALSE 0 SINA_USER %D5%C5%D2001</pre>


 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!