Web Analytics Made Easy -
StatCounter Why is DTD so complicated? - CodingForum


No announcement yet.

Why is DTD so complicated?

  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is DTD so complicated?

    After churning out a couple of custom DTDs, I'm becoming more familiar with the language, and I'm wondering — why is it so complicated? Why can't we just have an XML based language that looks a lot simpler, is easier for a human to parse and ultimately contains less of those damned exclamation marks? (OK, so that last one was a joke but this is a serious question)
    David House - Perfect is achieved, not when there is nothing left to add, but when there is nothing left to take away. (Antoine de St. Exupery).
    W3Schools | XHTML Validator | CSS Validator | Colours | Typography | HTML&CSS FAQ | Go get Mozilla Now | I blog!

  • #2
    Because SGML DTDs were intended to be parsable by the same parser as the SGML documents, and the structure carried on to the XML DTD format. (No, they are not quite the same.)

    W3C created XML Schemas as a more powerful but infinitely more complex alternative, but it's still intended to go through the same parser as the document. (A bad descision if you ask me.)

    Both of these are W3C Recommendations, the first a part of the XML recommendation, the second is a stand alone recommendation.

    Recently another XML structure definition language, RelaxNG, became an ISO standard. RelaxNG exists in both an XML parsable form and a more human readblility oriented compact form. RelaxNG is more powerful than DTD, but not quite as powerful as XML Schemas. And it is more easily readable than the other two.

    That said, if you have a look at the XHTML and HTML DTDs for a while you will soon learn the syntax. They are quite simple, linear constructs, nothing hierarchical like the other two validation methods.
    liorean <[[email protected]]>
    Articles: RegEx evolt wsabstract , Named Arguments
    Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
    Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards


    • #3
      You think DTD's are hard to read now? Wait until you hit Modularization of XHTML... it takes up to seven parameter entities (%foo to define an element...

      Oh, and even Mozilla has trouble with DTDs. Mozilla's expat is not a validating XML parser.
      "The first step to confirming there is a bug in someone else's work is confirming there are no bugs in your own."
      June 30, 2001
      author, ES-Membrane project (Github Pages site)
      author, Verbosio prototype XML Editor
      author, JavaScript Developer's Dictionary


      • #4
        The reason for Mozilla not using a validating parser doesn't have very much to do with them having problems with it. None of the browsers of today are validating, not even those with validating parsers. (libxml2 and MSXML can be validating if that option is toggled) No, they are nonvalidating for a number of reasons:
        - They are primarily HTML browsers. They make no bigger distinction between XHTML and HTML, or between the different versions of HTML. They support one set of elements for all variations regardless.
        - They are tagsoup browsers, and may accept XML as 'text/html', but they parse it as tagsoup HTML using the same element set.
        - If they get the XHTML or XML types, on the other hand, they handle it as XML and don't throw it to the tagsoup handler.
        - The tagsoup handler has error correction facilities, and pretty much ignores anything it doesn't understand, and tries to fix what it understands but finds erratic. This behavior is rather in the opposite direction of DTD validation.
        - Their XML handling doesn't need validation, because their main purpose when it comes to XML is either data retrieval or rendering, and those uses doesn't benefit from DTD parsing/validation. Only the default values for attributes and the external entities would provide anything useful ffor them, and those can be locked to the namespace recognition system instead.
        - They use namespace recognition instead of DTD recognition for default rendering/behavior, because that means they don't have to support separate data sets for many DTDs that are all a subset of one element/attribute/entity set.
        - Validation is not cost free. A validating parser, even one that stores the validation patterns from a DTD in a compiled form, takes processing, and we want to have as small overheads as possible when it's rendering or data retrieval we're talking about.
        liorean <[[email protected]]>
        Articles: RegEx evolt wsabstract , Named Arguments
        Useful Threads: JavaScript Docs & Refs, FAQ - HTML & CSS Docs, FAQ - XML Doc & Refs
        Moz: JavaScript DOM Interfaces MSDN: JScript DHTML KDE: KJS KHTML Opera: Standards