Web Analytics Made Easy -
StatCounter Get child element via html dom parser - CodingForum

Announcement

Collapse
No announcement yet.

Get child element via html dom parser

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get child element via html dom parser

    This is the first time I use DOMdocument() and DOMXpath() classes. What I try to do is, scanning through a html document to get child and parent nodes. First of it all the tag names, but may also need attributes like class and ids.

    The following example gets me the parent node of an element.

    Code:
    $item->parentNode->tagName == "li"
    How can I change this now that it gets me the child node?


    Code:
    <?PHP
      $doc = new \DOMDocument();
      $doc->loadHTML($htmlinput);
    
      $xpath = new \DOMXpath($doc);
      $articles = $xpath->query('//div[@class="blogArticle"]');
    
      // all links in h2's in .blogArticle
      $links = array();
      foreach($articles as $container) {
        $arr = $container->getElementsByTagName("a");
        foreach($arr as $item) {
          if($item->parentNode->tagName == "h2") {
            $href =  $item->getAttribute("href");
            $text = trim(preg_replace("/[\r\n]+/", " ", $item->nodeValue));
            $links[] = array(
              'href' => $href,
              'text' => $text
            );
          }
        }
      }
    ?>

    Thank you for your help !
    Last edited by clausrei; Sep 21, 2016, 07:06 AM.

  • #2
    How can I change this now that it gets me the child node?
    same way. you only need to specify which child node. cf. PHP: DOMNode - Manual

    Edit: be aware, though, that a child node can also be of text or comment type.
    Last edited by Dormilich; Sep 21, 2016, 08:21 AM.
    The computer is always right. The computer is always right. The computer is always right. Take it from someone who has programmed for over ten years: not once has the computational mechanism of the machine malfunctioned.
    André Behrens, NY Times Software Developer

    Comment


    • #3
      Thanks for that ! I was looking at the DOMdocument() documentation first and I could not find what I was looking for.

      I created an example and it works fine (see code below). I just wonder, why do I get in this example 10 childNodes for the <ul> node instead of 5 ?

      There is no text and no comment inside of the last <li> tag

      First I thought it would count the <a> tags (nodes) inside of the <li> tags, as well; but then I deleted one of the <a> tags in the html document and it still counts 10 childNodes.
      They don't seem to be the <a> tags, they are emty.

      Code:
      <?php 
       $htmlinput = <<<EOT
      
                <ul id="menu">
                  <li class="current"><a href="index.html">Home</a></li>
                  <li><a href="ourwork.html">Our Work</a></li>
                  <li></li>
                  <li><a href="projects.html">Projects</a></li>
                  <li></li>
                </ul>
            
      EOT;
      
        $doc = new DOMDocument();
        $doc->loadHTML($htmlinput);
      
      $tag=$doc->getElementsByTagName('ul')->item(0);
      
      $tag_att = $tag->hasAttributes();
      $tag_class = $tag->getAttribute("id");
      echo "The tag you selected has the class:  ".$tag_class."<br>";
      $v=$tag->hasChildNodes()?" has Child Nodes :":" has no Child Nodes.";
      
      echo "The tag is a &lt;".$tag->tagName."&gt; and it ".$v." (".$tag->childNodes->length.") <br>";
      $nodelistlength = $tag->childNodes->length;
      
      
       foreach($tag->childNodes as $item) { // DOMElement Object
          //$href =  $item->getAttribute("href");
          $text = trim(preg_replace("/[\r\n]+/", " ", $item->tagName));
          echo "  =>  ".$text."<br>";
          
        }
      
      
      if($tag->childNodes->item(0)->childNodes->item(0)->tagName == "a") {
      echo "<br>The first &lt;ul&gt; node has a child  &lt;li&gt; with a child &lt;a&gt;";
      }
      
      ?>
      What are the emty nodes, which have no tag name ?

      Result after executing the script above:

      The tag you selected has the class: menu
      The tag is a <ul> and it has Child Nodes : (10)
      => li
      =>
      => li
      =>
      => li
      =>
      => li
      =>
      => li
      =>

      The first <ul> node has a child <li> with a child <a>



      Every second Node is empy, why?
      Last edited by clausrei; Sep 21, 2016, 01:49 PM.

      Comment


      • #4
        Originally posted by clausrei View Post
        Every second Node is empy, why?
        because
        Originally posted by Dormilich View Post
        be aware, though, that a child node can also be of text or comment type.
        not everything in the DOM is an element!
        The computer is always right. The computer is always right. The computer is always right. Take it from someone who has programmed for over ten years: not once has the computational mechanism of the machine malfunctioned.
        André Behrens, NY Times Software Developer

        Comment

        Working...
        X