Web Analytics Made Easy -
StatCounter scraping - CodingForum

Announcement

Collapse
No announcement yet.

scraping

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • scraping

    Hi Guys,

    ok this is a big proboem for me and need to get this done asap

    Can someone show me ( or explain ) how i would create a script
    that would read the contents of a directory ( all html files )
    then display the results on another page with links

    eg
    mypage title ( hyperlinked to /article/mypage1.htm )

    any ideas greatly appreciated

  • #2
    PHP Code:
    $files glob('*.html');
    $html '';
    foreach (
    $files as $key => $file){
        
    $html .= file_get_contents($file);
    }
    echo 
    $html

    Comment


    • #3
      hi,

      where would i put that ( in main dir or article dir and would that read the file name and hyperlink it to the item ?

      thanks for the help

      Comment


      • #4
        dont think i explained it that well

        what im looking for is something that will read a directory
        copy all the file names , then on another page link the title to the corresponding page

        eg
        title 1 linked to page1.html
        title 2 linked to page2.html
        etc etc

        cheers

        Comment


        • #5
          what that does is it gets all html files from a directory and puts them into an array, you can edit it for what you need.

          Comment


          • #6
            hi,

            i put it in a blank php file and uploaded it to the dir with the articles

            all im getting is the html content of the pages

            ** edited **

            its getting all the html of the pages, i only need the file name & the file url

            Comment


            • #7
              done it

              many thanks , works perfectly

              Comment


              • #8
                i dont suppose you could tell me how to "exclude" files starting with index from being displayed ?

                Comment


                • #9
                  Assuming you meant "index"...but maybe i'm missing something....

                  If you mean "index.html", then you could just say if($file != 'index.html') instead of the substr().

                  The below does not assume "index.html" but rather, "index<anything here.....>"

                  PHP Code:
                  $files glob('*.html');
                  $html '';
                  foreach (
                  $files as $key => $file){
                      if(
                  substr($file,0,5) != 'index')
                          
                  $html .= file_get_contents($file);
                  }
                  echo 
                  $html
                  Active PHP/MySQL application developer available for immediate work.
                  syosoft.com mavieo.com - Remote Web Site Administration Suite - Reseller Ready

                  Comment


                  • #10
                    perfect
                    was trying variations of the not equal but couldnt get it quite right

                    many thanks both of you

                    Comment

                    Working...
                    X