To be an XML document, a data object needs to be well-formed. Well-formed XML document may also need to be valid.
Well-formed XML documents contain text and XML which conform to XML 1.0 or XML 1.1 specifications. A valid XML document is a well-formed document which satisfies the constraints defined in the Document Type Definition (DTD). A DTD is a set of rules which outline the tags which are allowed, allowed values for those tags, and how the tags relate to each other.
Checking an XML document against a list of constraints defined in a DTD or Schema is called validation. Validation is optional in XML.
Schema languages are used to create a list of constraints called schemas. DTD is also a schema but it is endorsed by W3C. Most XML parsers have DTD built-in.
A DTD focuses on the element structure of a document. It specifies:
- which element an XML document can contain
- what attributes and text an element can contain
- order of the child elements in an element
Element Declarations
For an XML document to be valid, every element in the document must be declared.
<!ELEMENT country (#PCDATA)>
<!ELEMENT student (name,major+,gender?)>
In the first example above, the element is named country and it can contain any text data. In the second example, the element named student has three child elements name, major, and gender, in the order defined. The + sign specifies that the student can define more than one majors i.e. more than one major element with one student element. The ? sign specifies that the gender element is optional. EMPTY means that the element cannot contain any value.