Web Analytics Made Easy -
StatCounter Showing Crawler Agent display name in forums end. etc. ? - CodingForum

Announcement

Collapse
No announcement yet.

Showing Crawler Agent display name in forums end. etc. ?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Showing Crawler Agent display name in forums end. etc. ?

    hi all.

    i have a little web site .. and i wrote a sample ( for testing in url ) crawler codes. and i quess, .

    i have a smaller error, .

    My crawler is best , but " Don't Showing Crawler Agent display name in forums end. etc. ( and error this "only , Unnamed Spider, Unknown Spider , Unknows Crawler etc. )

    i want to make this opinion, what do you think.

    thanks.


    Some codes here ..

    i wrote

    my ($ua) = new LWP::UserAgent ;
    $ua->agent('GoggalaTurkishBot');
    my ($req) = new HTTP::Request GET => $hedef;
    my ($res) = $ua->request($req);

    For a Showing Crawler name but, don't showing. ( unknown crawler )

    Code:
    
    
    sub check_page {
       my ($hedef) = shift @_;
       my ($visited) = shift @_;
       my ($broken) = shift @_;
       my ($hosts) = shift @_;
       my ($rlz) = shift @_;
       my ($BASE) = $hedef;
       my ($total_links)  = 0;
       my ($error_links) = 0;
       my ($forbidden_links) = 0;
       my (%errors);
       my (@doc_links, @exp_links);
    
       if (robocheck($hedef, $hosts, $rlz) == 1) {
           print "\n$hedef\n\tNOT CHECKED: Robot exclusion in force\n\n";
           return;
       };
    
       my ($ua) = new LWP::UserAgent ;
       $ua->agent('GoggalaTurkishBot');
       my ($req) = new HTTP::Request GET => $hedef;
       my ($res) = $ua->request($req);
        
       # Check the outcome of the response
       if ($res->is_success) {
          $visited->{$hedef}++;
          print "$hedef\n";
          if ($res->content_type !~ /html/i) {
              print "\t-- not an HTML document\n\n";
              return;
          }
          my($p) = parse_html($res->content);
          for (@{ $p->extract_links(qw(a)) }) {
              my ($link) = @$_[$[];
              push(@doc_links, $link);
              $total_links++;
              undef $link;
          }
          print "\tIlk gِzlem verileri çekildi ";
          for (@{ $p->extract_links(qw(a)) }) {
              my ($link) = url(@$_[$[], $BASE)->abs->as_string ;
              push(@exp_links, $link);
              if (robocheck($link, $hosts, $rlz) == 1 ) {
                  print "!";
                  $forbidden_links++;
              } else {
                  my ($head) = new LWP::UserAgent;
                  $head->agent('GoggalaTurkishBot');
                  my ($head_req) = new HTTP::Request HEAD => $link;
                  my ($head_res) = $head->request($head_req);
                  if ($head_res->is_success) {
                      print "+"; 
                      # $visited->{$link}++;
                  } else {
                      print  "-";  
                      push(@{$errors{$head_res->code}}, $link);
                      $broken->{$link}++;
                      $error_links++;
                  }
              }
          }
          print "\n";
          my ($cnt) = 0;
          foreach (@exp_links) {
                 $expansion{$_} = $doc_links[$cnt++];
          }

    Best Regards.

  • #2
    please, explain more clearly what you want from us.
    Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

    Comment


    • #3
      that doesn't look much like PHP...
      My thoughts on some things: http://codemeetsmusic.com
      And my scrapbook of cool things: http://gjones.tumblr.com

      Comment

      Working...
      X