WHAT IS XML ?

• XML stands for EXtensible Markup Language
• XML is a markup language much like HTML
• XML was designed to carry data, not to display data
• XML tags are not predefined. You must define your own tags
• XML is designed to be self-descriptive

 The Difference Between XML and HTML

• XML is not a replacement for HTML.
• XML and HTML were designed with different goals:
• XML was designed to transport and store data, with focus on what data is.
• HTML was designed to display data, with focus on how data looks.
• HTML is about displaying information, while XML is about carrying information.
With XML You Invent Your Own Tags
• XML language has no predefined tags.
• The tags used in HTML are predefined. HTML documents can only use tags defined in the HTML standard (like <p>, <h1>, etc.).
• XML allows the author to define his own tags and his own document structure.
XML Separates Data from HTML
• If you need to display dynamic data in your HTML document, it will take a lot of work to edit the HTML each time the data changes.
• With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for layout and display, and be sure that changes in the underlying data will not require any changes to the HTML.
• With a few lines of JavaScript, you can read an external XML file and update the data content of your HTML.
An Example XML Document
•<?xml version=”1.0″ encoding=”ISO-8859-1″?>
•<note>
•<to>Tove</to>
•<from>Jani</from> <heading>Reminder</heading>
•<body>
Don’t forget me this weekend!
•</body>
•</note>
XML Documents Form a Tree Structure
• XML documents must contain a root element. This element is “the parent” of all other elements.
• The elements in an XML document form a document tree. The tree starts at the root and branches to the lowest level of the tree.
• All elements can have sub elements (child elements):
All XML Elements Must Have a Closing Tag
In HTML, you will often see elements that don’t have a closing tag:
• <p>This is a paragraph
• <p>This is another paragraph
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
• <p>This is a paragraph</p>
• <p>This is another paragraph</p> 
• XML Tags are Case Sensitive
• XML Elements Must be Properly Nested
• XML Documents Must Have a Root Element
• XML Attribute Values Must be Quoted
Entity References
There are 5 predefined entity references in XML:
•&lt;        <         less than
•&gt;       >         reater than
•&amp;   &        ampersand 
•&apos;   ‘         apostrophe
•&quot;    “         quotation mark

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!– This is a comment –>

With XML, White Space is Preserved

HTML reduces multiple white space characters to a single white space:

XML Naming Rules

• Names can contain letters, numbers, and other characters
• Names cannot start with a number or punctuation character
• Names cannot start with the letters xml (or XML, or Xml, etc)
• Names cannot contain spaces
• Any name can be used, no words are reserved.

XML Elements vs. Attributes

<person id=“100″>

    <firstname>Anna</firstname> <lastname>Smith</lastname>

 </person>

<person>

    <id>100</id>

    <firstname>Anna</firstname>

    <lastname>Smith</lastname>

</person>

VALIDATION

“Well Formed” XML

“Valid” XML.

Well Formed XML Documents

• A “Well Formed” XML document has correct XML syntax.
• XML documents must have a root element
• XML elements must have a closing tag
• XML tags are case sensitive
• XML elements must be properly nested
• XML attribute values must be quoted

<?xml version=”1.0″ encoding=”ISO-8859-1″?>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>

Don’t forget me this weekend!

</body>

</note>

Valid XML Documents

A “Valid” XML document is a “Well Formed” XML document, which also conforms to the rules of a Document Type Definition (DTD):

<?xml version=”1.0″ encoding=”ISO-8859-1″?>

<!DOCTYPE note SYSTEM “Note.dtd”>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>

Don’t forget me this weekend!

</body>

</note>

XML DTD

<!DOCTYPE note

[

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)>

]>

Document Type Definition (DTD)

• A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes.
• A DTD can be declared inline inside an XML document, or as an external reference.
Internal DTD Declaration
• If the DTD is declared inside the XML file, it should be wrapped in a DOCTYPE definition with the following syntax:
• <!DOCTYPE root-element [element-declarations]>
Example XML document with an internal DTD:

<?xml version=”1.0″?>

<!DOCTYPE note [

<!ELEMENT note (to,from,heading,body)>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT heading (#PCDATA)>

<!ELEMENT body (#PCDATA)> ]>

<note>

<to>Tove</to>

<from>Jani</from>

<heading>Reminder</heading>

<body>

Don’t forget me this weekend

</body> </note>

Why Use a DTD ?

• With a DTD, each of your XML files can carry a description of its own format.
• With a DTD, independent groups of people can agree to use a standard DTD for interchanging data.
• Your application can use a standard DTD to verify that the data you receive from the outside world is valid.
• You can also use a DTD to verify your own data.
PCDATA
• PCDATA means parsed character data.
• Think of character data as the text found between the start tag and the end tag of an XML element.
• PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup.
• Tags inside the text will be treated as markup and entities will be expanded.
• However, parsed character data should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively.
CDATA
•CDATA means character data.
•CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.
Declaring Elements

  <!ELEMENT element-name category>

                                     or

  <!ELEMENT element-name (element-content)>

Empty Elements

<!ELEMENT element-name EMPTY>

Example:<!ELEMENT br EMPTY>XML example:<br />

DTD – Attributes

An attribute declaration has the following syntax:

<!ATTLIST element-name attribute-name attribute-type default-value>

DTD example:

<!ATTLIST payment type CDATA “check”>

XML example:

<payment type=”check” />

EXAMPLE

•<!DOCTYPE NEWSPAPER [ 
•<!ELEMENT NEWSPAPER (ARTICLE+)>
• <!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)>
•<!ELEMENT HEADLINE (#PCDATA)>
•<!ELEMENT BYLINE (#PCDATA)>
•<!ELEMENT LEAD (#PCDATA)>
•<!ELEMENT BODY (#PCDATA)>
•<!ELEMENT NOTES (#PCDATA)>

<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED>

<!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>

 <!ATTLIST ARTICLE DATE CDATA #IMPLIED>

 <!ATTLIST ARTICLE EDITION CDATA #IMPLIED>

<!ENTITY NEWSPAPER “Vervet Logic Times”>

<!ENTITY PUBLISHER “Vervet Logic Press”>

 <!ENTITY COPYRIGHT “Copyright 1998 Vervet Logic Press”>]>

EXERCISE

• CREATE A XML FILE WITH YOUR OWN TAGS ?
• CREATE A XML FILE WITH DTD AND TAGS IN SINGLE FILE ?