php - Get specific content with CURL on all links in a page (like a spider) -


i'm coding little app start url , on links in specific page. next, go on links , scrape contents show specific content (numbers 10 or more char). code retrieve blank page, wrong?

//i  $url = 'http://xxx.xxx'; $str = file_get_contents($url); $original_file = file_get_contents($url); $stripped_file = strip_tags($original_file, "<a>"); preg_match_all("/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is", $stripped_file, $matches); $links = $matches[1]; //print_r($links);  //f  //f $count = count($links); for($i=0;$i<=$count;$i++) {   //i   $curl_handle=curl_init();   curl_setopt($curl_handle, curlopt_url,$links[$i]);   curl_setopt($curl_handle, curlopt_connecttimeout, 2);   curl_setopt($curl_handle, curlopt_returntransfer, 1);   curl_setopt($curl_handle, curlopt_useragent, 'mozilla/5.0 (windows; u; windows nt 5.1; rv:1.7.3) gecko/20041001 firefox/0.10.1');   $query = curl_exec($curl_handle);   curl_close($curl_handle);   preg_match_all('/\b3\d+/', $query, $matches2);   $numbers = $matches2[0];     $count = 0;   foreach($numbers $value) {     if(strlen((string)$value) >= 10) echo '<br><br>[' . $count++ . "]" . $value;    }   //f    } //f 

issue#1: html can have urls following picking links /home/test.php without base http://www.example.com/. before requesting curl, print on screen or browser , check is.

<a href="/home/test.php">link</a> 

issue#2: 2 seconds curlopt_connecttimeout can prove less you. try increasing value.

curl_setopt($curl_handle, curlopt_connecttimeout, 10); 

if problem still persists, please show sample page link. , sample internal link blank response.


Comments

Popular posts from this blog

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -

google shop client API returns 400 bad request error while adding an item -