XML Documents

To can work with XML in PHP, you should know what XML documents are and how they work.
Becouse is a PHP course, in this lesson is presented a brief introduction about XML and its structure.

What is XML?

XML stands for EXtensible Markup Language.
It is similar to HTML in that it uses tags to mark up content. Instead of defining paragraphs, headings, and images, XML tags are used to organize data in a predictable hierarchy. But XML tags (also called elements) are not predefined. You must define your own tags.
XML was designed to store and transport data. It's frequently used to share information between computers that might be running on different operating systems.
XML is written in plain text, so you need only an simple text editor (like Notepad) to create /write a XML document.

XML syntax

• XML documents must begin with the syntax:
<?xml version="1.0" encoding="utf-8"?>
- It defines the XML version (1.0) and the encoding used, UTF-8 (or other encodig: ISO-8859-1).

• All XML elements must have a closing tag:
          - with opening and closing tag: <tag_name> text </tag_name>
          - or singular closing element (slash at end of the tag): <tag_name /> text.
Usually it's used the first variant.

• XML tags are case sensitive. The tag <note> is different from the tag <Note>.

• All elements must be properly nested within each other: <tag><child> Text </child></tag>
- "Properly nested" means that since the <child> element is opened inside the <tag> element, it must be closed inside the <tag> element.

• XML elements can have attributes in name/value pairs just like in HTML. The attribute values must always be quoted: <tag attribute="value">Text</tag>

• The characters: < , > , & , ' , " are restricted inside XML elements, they can be used with an entity reference.
• The syntax for writing comments in XML is similar to that of HTML: <!-- This is a comment -->

XML structure

The elements in an XML document form a document tree. XML documents must contain a root element. The tree starts at the root, and is "the parent" of all other elements. All elements can have attributes, text content, and sub-elements (child elements).
  - Example:
<?xml version="1.0" encoding="utf-8"?>
<books>
  <title name="Book title">
    <author>Author name</author>
    <chapter>Title of the chapter one</chapter>
    <chapter>Title of the chapter two</chapter>
  </title>
  <title name="Another Book">
    <author>Author name</author>
    <chapter>Chapter name</chapter>
    <chapter>Another chapter</chapter>
  </title>
</books>
The root element in this example is <books>.
All <title> elements in the document are contained within <books>.
The <title> element has one attribute (name) and 3 children: <author>, and 2 <chapter>
These data, stored in this structure can be relatively easily read and understood by humans as well as a programming language (PHP, ASP) which can read them from an ".xml" file. Generally, the XML documents are saved in files with the "xml" extension.

XML documents and DTD

A valid XML document have to follow certain rules, known generically as DTD (Document Type Definition).
The purpose of a DTD is to define the structure of an XML document, with a list of legal elements and attributes.
The DTD can be written inside the XML document or in an external file, with the ".dtd" extenasion. It's better to be stored in an external file and included in the XML document using a !DOCTYPE declaration:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE note SYSTEM "Books.dtd">
<books>
  <title name="Book title">
    <author>Author name</author>
    <chapter>Title of the chapter one</chapter>
    <chapter>Title of the chapter two</chapter>
  </title>
  <title name="Another Book">
    <author>Author name</author>
    <chapter>Chapter name</chapter>
    <chapter>Another chapter</chapter>
  </title>
</books>
The !DOCTYPE declaration in the example above, is a reference to an external DTD file (Books.dtd). The content of the Books.dtd is shown in the paragraph below.
<!ELEMENT books (title+)>
<!ELEMENT title (author, chapter+)>
<!ATTLIST title name CDATA #REQUIRED>
<!ELEMENT author (#PCDATA)>
<!ELEMENT chapter (#PCDATA)>
This description define the structure of each element and the attributes required. So, any new records added in that XML must follow this structure.
The + sign in the example above declares that the child element "chapter" must occur one or more times inside the "title" element.

If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition, within square brackets.
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE recipe [
<!ELEMENT books (title+)>
<!ELEMENT title (author, chapter+)>
<!ATTLIST title name CDATA #REQUIRED>
<!ELEMENT author (#PCDATA)>
<!ELEMENT chapter (#PCDATA)>
]>
<books>
  <title name="Book title">
    <author>Author name</author>
    <chapter>Title of the chapter one</chapter>
    <chapter>Title of the chapter two</chapter>
  </title>
  <title name="Another Book">
    <author>Author name</author>
    <chapter>Chapter name</chapter>
    <chapter>Another chapter</chapter>
  </title>
</books>
With a DTD, each of your XML files can carry a description of its own format. A groups of people can agree to use a standard DTD for interchanging data.

However, important for the lessons of this PHP course is the content of the XML document, its element and attributees.