Web Analytics Made Easy -
StatCounter PHP Web Crawler - CodingForum

Announcement

Collapse
No announcement yet.

PHP Web Crawler

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PHP Web Crawler

    Hi I have a script that parses out links in a page, now I want to figure out how to follow those links. Here is the script

    PHP Code:
    <? 
    $f 
    fopen("http://www.theotaku.com","r"); 
    while( 
    $buf fgets($f,1024) )
    {
       
    preg_match_all("/<a.*? href=\"(.*?)\".*?>(.*?)<\/a>/i",$buf,$words);
       
       for( 
    $i 0$words[$i]; $i++ )
       {
          for( 
    $j 0$words[$i][$j]; $j++ )
          {
             
    $cur_word strtolower($words[$i][$j]);
        print 
    "Indexing: $cur_word<br>";
       }
     }
    }
    ?>
    I get links some I can click on and some are just text how do I seperate the ones I can click and the ones that are just text. Also how do I follow the links?

  • #2
    You could try inserting the links into a database and put the date of the last time you went through the link and select all the links from the database and sort by date, so you always have the link that needs updating first. Then you could just keep refreshing the page.

    Comment

    Working...
    X