Web Analytics Made Easy -
StatCounter Parsing parts of an HTML file? - CodingForum


No announcement yet.

Parsing parts of an HTML file?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing parts of an HTML file?

    I have a huge webpage (over 300kb of just links) and what I want to be able to do is parse just pieces of the big page onto a template or another page. Basically I want to be able to put comments or anchors or something in the big HTML file to tell a CGI parsing script where to start parsing and where to stop parsing. Not only do I want it to be able to do that, I want it to work with variables. Being able to parse Part A or Part B not the entire page. I have found many scripts that use CGI and SSI to parse entire webpages, but I can't find anything that will parse customly defined parts of a page. Is this possible to do? If so, somebody please point me in the right direction of a script that already accomplishes this, or some code that I could use to start writing a script like this.

    To help you visualize what I want to do....I want to use a CGI script to parse out different parts of this (www.smasonline.com/lyrics/list.html) lyrics page. So I can divide it into sections for each letter of the alaphbet.

    If you could help me I would be forever greatful.

    Thanks in advance,

  • #2
    What you could do is something like this:
    (not guarenteed to work and youd definately have to test it)

    use LWP; # not sure if this is correct ... maybe LWP::Simple;
    $addr = "http://www.somewhere.com/";
    $html = get("$addr");
    @data = split(/\n/,$html);
    foreach (@data) {
     if ($_ =~ /<!--(.*)-->/gis) {
      if ($1 eq "LIST START") {
       $start_typing = "true";
      } elsif ($1 eq "LIST END") {
        $start_typing = "false";
     if ($start_typing eq "true") {
      print $_;
    Note: you have to put a comment (eg: <!--LIST START--> and <!--LIST END-->) where the content or links start.
    Last edited by mr_ego; Jul 13, 2002, 11:25 PM.
    Programming since
    3 years old.


    • #3
      What exactly are you trying to do here?

      Do you want to split the whole page into a group of pages or just print out the content within the <!-- LIST ... --> comments?

      By the way, it is LWP::Simple that you want here .

      If you want to parse HTML documents there are a few modules out there which can help you..


      • #4

        I want to be able to split the page into lots of smaller pages. But I want a script that will do it for me. I want to continue to make the big webpage full of links, and have it split into smaller pages by a script using comments. I want a page for each letter of alphabet.

        That way when I get new lyrics I can just update the big page and all the other pages would include the new lyrics as well; because they are just parsing whats in between comments. The idea I have is to use the big HTML file in the same kind of way I would use a database. Except I just want pull things from the database instead of searching it or anything like that.

        I know this all sounds confusing, sorry. Hopefully you will understand what I mean.

        As far as modules go, I can't use them. Thanks for the idea though. The site is being hosted by a crappy webhost company. So I can't change anything like that, or use PHP or use anything useful besides Perl and SSI.

        I have tried the script you posted mr_ego. Thanks for pointing me in the right direction. But I know very little about Perl....I've always just used other peoples scripts, never took time out to learn any language. Anyways, I set up my own web server to test it out on temporarly. I always get a 500 error and when I check the Apache error log, I get "Syntax error on line 23 of EOF". Anybody got any ideas how to fix this, or what I'm doing wrong?

        I have posted this same question in multiple forums, you guys are the first people that even responded. Thanks alot