Web Analytics Made Easy -
StatCounter How to parse an HTML document reponse? - CodingForum

Announcement

Collapse
No announcement yet.

How to parse an HTML document reponse?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Resolved How to parse an HTML document reponse?

    I made a script that downloads a page (and I am not the owner of this page) with a GET request,I know that the HTML reponse is available through reponseText but I have no Idea how to parse this string (which include <html>,<head> and <body> tags) in order to transform it to an HTML document (Not an XML document!); please, have you any ideas ? at least one method working for Firefox.
    Last edited by Hamza7; Aug 30, 2011, 11:48 AM.

  • #2
    Perhaps I'm missing some complexity, but it seems to me that, if all you are wanting to do is parse the 'responseText' into an HTML document, all you need to do is create a DIV or section, within your HTML with a specific id and use:

    Code:
    document.getElementById('whateverYouCalledIt').innerHTML = responseText;
    Since that seems too simple an answer, I assume that you are wanting to strip some of the tags. If that is the case, I would either iterate over the string to manually strip the unwanted tags (tricky) or something like: (untested)

    Code:
    var outsideDocArea = document.getElementById('whateverYouCalledIt');
    outsideDocArea.innerHTML = responseText;
    outsideDocArea.removeChild(outsideDocArea.getElementsByTagName("html"));
    Like I said, I've not tested that, but that is where I would start.

    Comment


    • #3
      Originally posted by Hamza7 View Post
      I have no Idea how to parse this string (which include <html>,<head> and <body> tags) in order to transform it to an HTML document
      Well it seems to me that a string that contains
      <html>,<head> and <body> tags is already HTML, maybe be more specific about
      exacrtly what you are trying to do.

      Or maybe it's this you want ?

      document.write(xmlHTTP.reponseText);

      Or maybe this ...

      document.getElementById('myDiv').innerHTML = xmlHTTP.reponseText;

      Please, help us help you.

      Comment


      • #4
        You can use this function to convert a full HTML source text into a HTML document. You can use the returned document whereever you normally use window.document

        Code:
        function htmltocontext(responseText) {
           // create documentType
           var dt = document.implementation.createDocumentType("html", "-//W3C//DTD HTML 4.01 Transitional//EN", "http://www.w3.org/TR/html4/loose.dtd");
           // create new HTML document
           var doc = document.implementation.createDocument("", "", dt);
           // create new documentElement = <html> element
           var newDocumentElement = doc.createElement("html");
        
           // strip off everything before and after the innerHTML of the <html> element in responseText
           var beginPos = responseText.toLowerCase().indexOf('<html');
           beginPos = responseText.toLowerCase().indexOf('>', beginPos)+1;
           var endPos = responseText.toLowerCase().indexOf('</html');
           
           responseText = responseText.substring(beginPos, endPos);
        
           // assign innerHTML to new documentElement
           newDocumentElement.innerHTML = responseText;
           
           // append documentElement to HTML document
           doc.appendChild(newDocumentElement);
        
           return doc;
        }

        Comment


        • #5
          Hello devnull69
          I really like the code you posted
          (nicely committed) . I wonder what advantages
          it has over this approach or if this is even comparable ?

          Code:
          <html><head>
          <script>
          function init(){
          xmlhttp = new XMLHttpRequest();
          xmlhttp.open("GET", "content1.html");
          xmlhttp.onreadystatechange = function() {
          	if (xmlhttp.readyState == 4 && xmlhttp.status == 200) {
          		var myDiv = document.getElementById("myDiv");	
          		myDiv.innerHTML = xmlhttp.responseText;
          		alert(myDiv.getElementsByTagName('div')[0].innerHTML)
          }	 }	
          xmlhttp.send(null);}
          </script>
          </head><body onload="init()">
          <div id="myDiv" style="display:none;"></div>
          </body></html>
          This is a discussion forum (I believe)
          so discuss away anyone who has
          some insight on this.

          Thank you for reading this.

          Comment


          • #6
            I started using this function in Greasemonkey scripts because I realized that sometimes it was not feasible to use a DIV and "pump" the whole HTML including DOCTYPE, head and meta tags into it. Sometimes this resulted in empty pages that just loaded endlessly.

            After using this function I was able to create a whole new HTML document and use whatever DOM method on it (including getElementById()) without affecting the underlying page.

            Comment

            Working...
            X