Web Analytics Made Easy -
StatCounter regex help - CodingForum

Announcement

Collapse
No announcement yet.

regex help

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • regex help

    Hi. I currently have some regex that finds text between comments and saves each instance to an array. For example, if the HTML being searched is file1.htm:
    Code:
    This is regular page code
    <!-- OPEN -->This is code between comments<!-- CLOSE -->
    the PHP code below will strip out "This is code between comments" an save it to array $split:
    PHP Code:
    $searchThis file_get_contents('file1.htm');
    $split preg_split('/(<!-- OPEN --> .*? <!-- CLOSE -->)/i'$searchThis, -1PREG_SPLIT_DELIM_CAPTURE); 
    Now, the problem I run into is that this only works when the comments are on the same line as the text in between them. For example, this works:
    Code:
    <!-- OPEN -->This is code between comments<!-- CLOSE -->
    but this doesn't:
    Code:
    <!-- OPEN -->
    This is code between comments
    <!-- CLOSE -->
    Anyone have any ideas on how to fix this? Will I have to somehow include a newline reference in the Regex? Thanks for any help!
    Last edited by mtd; Jul 30, 2005, 11:52 AM.

  • #2
    I'm not totally sure if this is what you're trying to do, so sorry if it's not.

    PHP Code:
    <?php
    preg_match_all
    ('/<!-- OPEN -->(.*?)<!-- CLOSE -->/si'$searchThis,$split,PREG_PATTERN_ORDER);
    // with <!-- OPEN --> and <!-- CLOSE -->
    print_r($split['0']);
    // without <!-- OPEN --> and <!-- CLOSE -->
    print_r($split['1']);
    ?>

    Comment


    • #3
      You need to add a multiline to your pattern modifiers.
      I'm not validating your code atm as I assume its working fine for you. Simply add an 'm' after your '/i' within your matching pattern. I'm guessing as well that you want to be using preg_split_delim_capture as well, to store this into a nice little array
      PHP Code:
      header('HTTP/1.1 420 Enhance Your Calm'); 
      Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

      Comment


      • #4
        Hi, and thanks for the help so far. not quite there yet, though. I may have over-simplified before to try to make it easier for you, so perhaps if I provide the full information about how it works now (You're right, Fou-Lu, the below works without errors):

        file1.htm:
        Code:
        This is the page code
        <br>
        <br>
        <!-- OPEN id="para1" type="text" -->This is some code between comments<!-- CLOSE -->
        <br>
        <br>
        <!-- OPEN id="para2" type="text" -->This is some more code between comments<!-- CLOSE -->
        Reading page file1.htm,
        PHP Code:
        $searchThis file_get_contents('file1.htm'); 
        $split preg_split('/(<!-- OPEN .*? CLOSE -->)/i'$searchThis, -1PREG_SPLIT_DELIM_CAPTURE); 
        returns (print_r):
        Code:
        Array
        (
            [0] => This is the page code
        <br>
        <br>
            [1] => <!-- OPEN id="para1" type="text" -->This is some code between comments<!-- CLOSE -->
            [2] => <br>
        <br>
            [3] => <!-- OPEN id="para2" type="text" -->This is some more code between comments<!-- CLOSE -->
        Then (this is something I didn't mention before), determine which has comments and which doesn't:
        PHP Code:
        for($i=0$i count($split); $i++) { 
                if(
        substr($split[$i], 09) == '<!-- OPEN'$subSplit[] = preg_split('/<!-- OPEN id=\"(.*?)\" type=\"(.*?)\" -->(.*?)<!-- CLOSE -->/i'$split[$i], -1PREG_SPLIT_NO_EMPTY PREG_SPLIT_DELIM_CAPTURE);
                } 
        That last part gives each commented section an array with type, id, and content.

        Now the problem, which I mentioned before, is that the comments need to be on the same line as the content it encompasses. This presents a problem for large chunks of text. I tried adding the m, but no luck. Maybe this information will help?
        Last edited by mtd; Jul 30, 2005, 06:30 PM.

        Comment


        • #5
          Hey mtd,
          can you tell me what you would like this array to be? I mean, its a simple task and all, but I need you to tell me how you want it.
          Currently, I have this, how do you want it changed?
          Code:
          Array
          (
              [0] => This is page code
          <br />
          <br />
          
              [1] =>  id="para1" type="text" 
              [2] => This is some code 
          between comments
              [3] => 
          <br />
          <br />
          
              [4] =>  id="para2" type="text" 
              [5] => This is some more code between comments
          )
          I assume as well, you will want me to drop the line breaks but that depends on the usage.
          Tell me what you would like this array structured as, and I'll show you how to do it.

          Edit:
          May I make a suggestion on your desired output?
          Code:
          Array
          (
              [para1] => Array('type' => 'text', 'comment' => 'This is code\n between comments')
              [para2] => Array('type' => 'text', 'comment' => 'This is some more  code between comments')
          // For generic, no id or type specified.:
              [generic] => Array('This is code\n between comments', 'This is some more code between comments')
          )
          Or something of the sorts? Would be fairly easy to set it up like so, a simple loop will take care of that. However, it depends completely on what it is you need to do with it.
          Last edited by Fou-Lu; Jul 31, 2005, 06:15 AM.
          PHP Code:
          header('HTTP/1.1 420 Enhance Your Calm'); 
          Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

          Comment


          • #6
            Hi Fou-Lu, thanks again. I think what I need (and what I have now - feel free to correct me if I am making things more difficult than they need to be) are two arrays. The first reads through the page code:
            Code:
            Array
            (
                [0] => This is page code
            <br />
            <br />
            
                [1] => <!-- OPEN id="para1" type="text" -->This is some code between comments<!-- CLOSE -->
                [3] => 
            <br />
            <br />
            This is code w/o comments
                [1] => <!-- OPEN id="para2" type="text" -->This is more commented code<!-- CLOSE -->
            )
            giving us array "split". Then, read through that array and remove any value that does not have a commented section, leaving us with a new array "subSplit" of:
            Code:
            Array
            (
                [0] => [0] =>para1
                          [1] => text
                          [2] => This is some code between comments
                [1] => [0] =>para2
                          [1] => text
                          [2] => This is more commented code
            )
            That is all I need, and how I have it now. From there, I read through each value and extract the id, text and content into array "returnCode":
            PHP Code:
            for($i=0$i count($subSplit); $i++) { 
                    
            $returnCode[$i]['id'] = $subSplit[$i][0]; 
                    
            $returnCode[$i]['type'] = $subSplit[$i][1];    
                    
            $returnCode[$i]['code'] = $subSplit[$i][2];
                    } 
            I do a number of things with that information. That all works fine. All I need to do is make the code work even if the comments are on a different line than the text between them (see first post). I appologize if I was unclear... the array structure is fine as I have if (or at least it is functional, and works with the rest of my code) - I just need to fix the problem of when the comments are on a new line.

            Thanks so much for your help thus far.
            mtd

            Comment


            • #7
              PHP Code:
              <?php
              $subject 
              'This is the page code
              <br>
              <br>
              <!-- OPEN id="para1" type="text" -->This is some code between comments<!-- CLOSE -->
              <br>
              <br>
              <!-- OPEN id="para2" type="text" -->This is some more code between comments<!-- CLOSE -->'
              ;

              preg_match_all('/<!-- OPEN id="(.*?)" type="(.*?)" -->(.*?)<!-- CLOSE -->/sim'$subject$resultPREG_SET_ORDER);
              $myArray = array();
              for (
              $i 0$n count($result); $i $n$i++)
              {
                  
              $myArray[] = array('id' => $result[$i][1], 'type' => $result[$i][2], 'value' => $result[$i][3]);
              }
              var_dump($myArray);
              ?>
              This way?
              I'm not sure if this was any help, but I hope it didn't make you stupider.

              Experience is something you get just after you really need it.
              PHP Installation Guide Feedback welcome.

              Comment


              • #8
                Looking good marek, beat me to it :P
                I'm going to alter this a bit though, just because of this line here:
                Then (this is something I didn't mention before), determine which has comments and which doesn't:
                I'd assume then that this could be empty. So, your options are as follows. Either you can create these with the possibility of having `id` and `type`, or you can create your array elements dynamically. I'll show you how to do it dynamically:
                PHP Code:
                // Freshly stolen from Marek's last post :)
                <?php 
                error_reporting
                (E_ALL);
                $subject 'This is the page code 
                <br> 
                <br> 
                <!-- OPEN id="para1" type="text" -->This is some code between comments<!-- CLOSE --> 
                <br> 
                <br> 
                <!-- OPEN id="para2" type="text" -->This is some more code between comments<!-- CLOSE -->
                <!-- OPEN -->This is the final set, no id or type<!-- CLOSE -->
                '


                preg_match_all('/<!-- OPEN ([^(-->)]*)-->(.*?)<!-- CLOSE -->/sim'$subject$resultPREG_SET_ORDER); 
                $myArray = array();
                for (
                $i 0$n count($result); $i $n$i++) 

                    
                // 1 = attributes
                    // 2 = value:
                    
                $attrArray = array();
                    if (!empty(
                $result[$i][1]))
                    {
                        
                $attributes explode(' 'trim($result[$i][1]));
                        foreach (
                $attributes AS $attr)
                        {
                            
                $attr_pair explode('='$attr);
                            
                $attr_pair[1] = str_replace('\''''$attr_pair[1]);
                            
                $attr_pair[1] = str_replace('"'''$attr_pair[1]);
                            
                $attrArray[$attr_pair[0]] = $attr_pair[1];
                        }
                    }
                    
                $myArray[] = array_merge($attrArray, array('value' => $result[$i][2]));


                echo 
                '<pre>';
                print_r($myArray); 
                echo 
                '</pre>';
                ?>
                print_r output:
                Code:
                Array
                (
                    [0] => Array
                        (
                            [id] => para1
                            [type] => text
                            [value] => This is some code between comments
                        )
                
                    [1] => Array
                        (
                            [id] => para2
                            [type] => text
                            [value] => This is some more code between comments
                        )
                
                    [2] => Array
                        (
                            [value] => This is the final set, no id or type
                        )
                
                )
                Downside: You always need to know what your elements will be.
                Upside: You needn't fill in id and type when you don't need them.
                Note that for either methods, either mine or marek's, I suggest you use an htmlspecialchars or another similar method on the comments section. This would be to keep problems from occuring within the comments part:
                Code:
                <!-- OPEN -->This is an <!-- OPEN --> AND <!-- CLOSE --> comment.<!-- CLOSE -->
                Would become (at least):
                Code:
                <!-- OPEN --> This is an &lt;!-- OPEN --&gt; AND &lt;!-- CLOSE --&gt; comment.<!-- CLOSE -->
                I'm just hoping that the example isn't changed into the html here, otherwise it defeats the point of doing it, lol.
                PHP Code:
                header('HTTP/1.1 420 Enhance Your Calm'); 
                Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

                Comment


                • #9
                  You two are awesome! Works like a charm, even with the comments on a different line. And to be honest, it looks much more efficient/versatile than the code I was originally using.

                  I'll play around with both versions and let you know how I make out.
                  THANKS AGAIN!

                  -- mtd

                  Comment


                  • #10
                    Originally posted by Fou-Lu
                    Looking good marek, beat me to it :P
                    I'm going to alter this a bit though, just because of this line here:

                    I'd assume then that this could be empty. So, your options are as follows. Either you can create these with the possibility of having `id` and `type`, or you can create your array elements dynamically. I'll show you how to do it dynamically...
                    Hehe I tryed to do such a thing too http://www.codingforum.net/showthread.php?t=61603... but your attempt is kind of simpler.
                    I'm not sure if this was any help, but I hope it didn't make you stupider.

                    Experience is something you get just after you really need it.
                    PHP Installation Guide Feedback welcome.

                    Comment


                    • #11
                      Just a quick follow-up...

                      for any tag="value", I'm having trouble if the value has a space in it. For example:

                      <!-- OPEN id="paragraph one" type="text"-->This is content<!-- CLOSE -->

                      print_r reveals:
                      Code:
                      [0] => Array
                              (
                                  [id] => paragraph
                                  [one"] => 
                                  [type] => text
                                  [value] => This is content
                              )
                      So the issue is, it is seeing the space and creating another array value. Likely it is just some small tweak in the regex? But since I'm horrible with regex.....

                      :-) Thanks again!

                      Comment


                      • #12
                        Hi MTD,
                        The reason why is quite simple, my explode is only searching for spaces and doesn't explicitly look for them for each section.
                        Now... fixing this will be easy, but I need to know from you a couple of things. I'll also try altering it around to cover all fields, I should have done this in the first place, but meh
                        1. Are your attributes always going to be wrapped in either quotations or double quotations (eg, type="text")?
                        2. Are you ever going to have spaces between the attribute value and these quotations (eg, type=" text ")?

                        I'll take a look see after, shouldn't be a tough one to fix up. Preg will most likely need to be used for this situation instead of a simple explode, as we need to identify what is a part of what.

                        Hey Marek, does that link you posted happen to evaluate for white-spaces within the attribute values? I noticed it doesn't use a seperation code for its initial values...
                        maybe I should be looking at the initial code.
                        Last edited by Fou-Lu; Aug 4, 2005, 01:36 AM.
                        PHP Code:
                        header('HTTP/1.1 420 Enhance Your Calm'); 
                        Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

                        Comment


                        • #13
                          You know, I think it doesn't. Anyhow this is what I've came up with now. It checks for spaces ($chars holds additional characters that are allowed in arguments values) in quotes so you can have any values of arguments (as long as it's in quotes) and all valid arguments (in $arguments array) can be in any order. If the same argument is used more than once the last one will be used.
                          PHP Code:
                          <?php
                          $input 
                          'This is the page code
                          <br>
                          <br>
                          <!-- OPEN type="text I don\'t like!" id="bar" -->This is some code between comments<!-- CLOSE -->
                          <br>
                          <br>
                          <!-- OPEN id="[email protected]" type="text I like" -->This is some more code between comments<!-- CLOSE -->
                          <!-- OPEN id="foo" -->This doesn\'t have a type set<!-- CLOSE -->'
                          ;
                          $arguments = array('id''type');
                          $chars '[email protected]\'';

                          // ----------
                          $n count($arguments);
                          $regex '/(?><!-- OPEN )';
                          $regex .= str_repeat('(?:(' implode('|'$arguments) . ') *?= *?"([\w ' $chars ']*?)")? *?'$n);
                          $regex .= '-->(.*?)<!-- CLOSE -->/sim';
                          //print $regex;
                          preg_match_all($regex$input$resultPREG_PATTERN_ORDER);
                          array_shift($result);
                          $myArray = array();
                          $n $n 2;
                          for (
                          $i 0$m count($result[0]); $i $m$i++)
                          {
                              
                          $myArray[$i] = array();
                              for(
                          $j 0$j $n$j += 2)
                              {
                                  if(!empty(
                          $result[$j][$i]))
                                  {
                                      
                          $myArray[$i][$result[$j][$i]] = $result[$j 1][$i]; // argh.
                                  
                          }
                              }
                              
                          $myArray[$i]['value'] = $result[$n][$i];
                          }
                          print 
                          '<pre>';
                          var_dump($myArray); 

                          ?>
                          I hope It doesn't have errors .
                          Last edited by marek_mar; Aug 5, 2005, 02:48 AM.
                          I'm not sure if this was any help, but I hope it didn't make you stupider.

                          Experience is something you get just after you really need it.
                          PHP Installation Guide Feedback welcome.

                          Comment


                          • #14
                            Thanks for the replies! To answer Fou-Lu's questions,

                            1) Yes, always! They will always be double quotes.
                            2) I can't forsee such a situation, but it may occur - I'm not the only one who'll be implementing these tags. So, if it's not to hard, I'd love at least the option. But, if it is a lot of extra work, you definitely don't have to worry about it.

                            Marek,
                            Thanks for the code... I'll check it out and see how it works. Now, where i will always be using double quotes, do the single quotes/apostrophes need to be escaped? I'd rather use some kind of quotemeta/htmlspecialchars at some point (where, I'm not quite sure - what point would be best?) than have to deal with escaping all my entries. Especially since, like I said, I won't be the only one handling the implementation of the tags.

                            Comment


                            • #15
                              Quotes around argument values (eg. type="text"). This way you can have spaces or any special character you want (except the double quote). I had to escape single quotes becouse I had the string in single quotes.
                              I have simplified a part of it. I edited my previous post.
                              I'm not sure if this was any help, but I hope it didn't make you stupider.

                              Experience is something you get just after you really need it.
                              PHP Installation Guide Feedback welcome.

                              Comment

                              Working...
                              X