Web Analytics Made Easy -
StatCounter PHP link scraping - CodingForum

Announcement

Collapse
No announcement yet.

PHP link scraping

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PHP link scraping

    Hi guys! I'm trying to turn a page filled with a giant table of links into an array that I can use to check the links for validity. I realize there are better ways to do this. It's more of a learning process than anything. However, when using the code below that I've been trying to edit, it's giving no results. Is there any noticable reasons as to why it's not giving me the desired result?

    Thanks in advance!!

    Matt

    Here is a sample of a row from the table I am trying to scrape.
    Code:
    <tr> 
    <td>1</td><td>The Hangover 2</td><td>http://www.novamov.com/video/kcyzc7aoduw12</td><td>http://www.putlocker.com/file/F72561F9414120CA</td><td>http://www.putlocker.com/file/24D2A737D555C0D9</td><td>http://www.putlocker.com/file/98592CE881B32D29</td><td>http://www.sockshare.com/file/1BE3ED2D67C9918E</td></tr>

    And here is my code that is returning 0 results:
    PHP Code:

    <?php
    // get the HTML
    $html file_get_contents("choosing to hide url here");

    preg_match_all(
        
    '/<tr> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <\/tr>/s'
    ,
        
    $html,
        
    $posts,
        
    PREG_SET_ORDER // formats data into an array of posts
    );
    $num_records = @mysql_num_rows($posts);

    foreach (
    $posts as $post) {
        
    $movie_id $post[1];
        
    $title $post[2];
        
    $version1 $post[3];
        
    $version2 $post[4];
        
    $version3 $post[5];
        
    $version4 $post[6];
        
    $version5 $post[7];
    }

    if (
    $num_records 1) {
    print 
    "No results"
    } else {
    echo 
    $posts;
    };
    ?>
    Last edited by MattClark; Sep 13, 2011, 03:13 AM.

  • #2
    You should use count, not mysql_num_rows, when need to count array elements.

    Try the following code

    PHP Code:
    $html '<tr> 
    <td>1</td><td>The Hangover 2</td><td>http://www.novamov.com/video/kcyzc7aoduw12</td><td>http://www.putlocker.com/file/F72561F9414120CA</td><td>http://www.putlocker.com/file/24D2A737D555C0D9</td><td>http://www.putlocker.com/file/98592CE881B32D29</td><td>http://www.sockshare.com/file/1BE3ED2D67C9918E</td></tr>'
    ;

    $pattern '#<tr>\s*<td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td></tr>#si';
    if (
    preg_match_all($pattern$html$postsPREG_SET_ORDER))
    {
            foreach(
    $posts as $post)
            {
                    
    print_r($post);
                    
    $movie_id $post[1];
                    
    $title $post[2];
                    
    $version1 $post[3];
                    
    $version2 $post[4];
                    
    $version3 $post[5];
                    
    $version4 $post[6];
                    
    $version5 $post[7];
            }

    Comment


    • #3
      I'm slightly confused. Each of the two codes are doing the same thing...they're pulling every link from a text file that contains every link on my site. What I'm ultimately trying to do is get the page content of each individual link and make sure that the movie player is still embedded on the pages that those links go to.

      When I do it, it's getting the page content of every link, but it's putting it all onto the same page, so that I can't html scrape each of them individually. I'm guessing i'm supposed to remove them from the array? But I'm not entirely sure how.

      Comment


      • #4
        bump. if anyone knows what I would do to fix this problem, I would appreciate it greatly!

        Comment

        Working...
        X
        😀
        🥰
        🤢
        😎
        😡
        👍
        👎