Introduction to XML
12-Apr-24
Topics
◼ Introduction
t e l
a
◼ XML versus HTML
◼ XML terminologies
a P
ic
◼ XML standards (XML namespace, XML schema)
o n
M
2
Introduction
◼
e l
XML stands for eXtensible Markup Language.
t
a
◼ XML is designed to transport and store data.
◼
a P
XML is important to know, and very easy to learn.
ic
◼ XML is markup language much like HTML.
n
◼ XML was designed to carry data, not to display data.
◼
o
Tags are added to the document to provide the extra
M
information.
3
Introduction
◼
e l
XML is a simplified subset of SGML (Standard
t
a
Generalised Mark-Up Language)
P
◼ XML and HTML have a similar syntax…
a
both derived from SGML.
◼
n ic
Officially recommended by W3C since 1998.
o
◼ Primarily created by Jon Bosak of Sun Microsystems.
M
4
Why XML is used?
◼
e l
XML documents are used to transfer data from one
t
a
place to another often over the internet.
P
◼ XML is used in many aspects of web development.
◼
ic a
XML is often used to separate data from presentation.
In many HTML applications,
n
◼
o
XML is used to store or transport data, while
HTML is used to format and display the same data.
M
5
Difference between XML and HTML
◼ XML is not a replacement for HTML.
Different goals:
t e l
a
◼
◦ XML was designed to transport and store
P
data, with focus on what data is
a
ic
◦ HTML was designed to display data, with focus
on how data looks
◼
o n
HTML is about displaying information.
XML is about carrying information.
M
◼
6
HTML vs XML
⚫ Fixed set of tags ⚫
t e l
Extensible set of tags
⚫ Presentation oriented ⚫
P a
Content orientated
a
⚫ No data validation ⚫ Standard Data
ic
capabilities infrastructure
⚫ Tags are not case
sensitive
o n ⚫ Tags are case sensitive
⚫
M
Tags are used for display. ⚫ Tags are used to describe
documents and data.
7
XML File
◼ How to write and store XML file?
t e l
As you did before with CSS and JavaScript files.
a
◼
By using text file in different extension:
P
◼
◦ .css for CSS file
ic
◦ .js for JavaScript
a
◼
o n
Then, .xml for XML file.
M
8
The Basic Rules
◼ XML is case sensitive
t e l
a
◼ All start tags must have end tags
◼
a P
Elements must be properly nested
XML declaration is the first statement
ic
◼
Every document must contain a root element
n
◼
◼
o
Attribute values must have quotation marks
◼
M
Certain characters are reserved for parsing
9
XML Declaration
◼ Placed at the start of an
XML document
t e l
<?xml version="1.0" ?>
a
◼ Informs XML software of
P
– the version of XML the
<?xml
document conforms to
a
version="1.0"
ic
– the character encoding
scheme used in the
encoding="UTF-8" ?>
document
o n
– whether or not a set of <?xml
version="1.0"
M
external declarations
affect the interpretation of encoding="UTF-8"
this document standalone="yes" ?>
10
XML structure
◼ Look at the following student ID.
Student Identification
t e l
ID Number: 1
P a
Name: Rana Jawad
BOD: 9/9/1999
ic a
o n
Issuing Date: 10/4/2011
M
◼ Think about the main items of this ID!!!!
◼ Which items are constant and which variables?
11
XML structure - Tags
◼
◼
Lets put them inside suited tags:
What about the next student?
t e l
<studentID>
<IdNumber>1</IdNumber>
P a
a
<Name>Rana Jawad</Name>
ic
<BOD>9/9/1999</BOD>
n
<IssueDate>10/4/2011</IssueDate>
</studentID>
<studentID> o
M
<IdNumber>2</IdNumber>
<Name>Ahmed Sameer</Name>
<BOD>3/3/1998</BOD>
<IssueDate>10/4/2011</IssueDate>
</studentID>
12
XML Does Not Do Anything
◼
does not DO anything.
t l
Maybe it is a little hard to understand, but XML
e
◼
information.
P a
XML was created to structure, store, and transport
ic a
The previous example is a student ID, stored as
XML:
◼
o n
It is quite self descriptive.
But still, this XML document does not DO anything.
M
◼
◼ It is just information wrapped in tags.
◼ Someone must write a piece of software to send,
receive or display it.
13
With XML you invent your Own tags
t l
The tags in the example above (like <Name> and
e
<BOD>) are not defined in any XML standard.
◼
XML document.
P a
These tags are "invented" by the author of the
ic a
That is because the XML language has no
predefined tags.
◼
o n
The tags used in HTML are predefined.
HTML documents can only use tags defined in the
M
◼
HTML standard (like <p>, <h1>, etc.).
◼ XML allows the author to define his/her own tags
and his/her own document structure.
14
XML Documents Form a Tree Structure
l
◼ XML documents must contain a root element.
◼
the tree.
a e
The tree starts at the root and branches to the lowest level of
t
All elements can have sub elements (child elements):
P
◼
<root>
<child>
ic a
<subchild>.....</subchild>
</root>
o n
</child>
◼
M
The terms parent, child, and sibling are used to describe the
relationships between elements.
◼ Children on the same level are called siblings.
◼ All elements can have text content and attributes (just like in
HTML).
15
XML structure - Tree
◼ XML Files are Trees
t e l
a
◼ An XML document has a single root node.
◼ Preorder traversal are usually used.
a P
n ic
address
name
o
email phone birthday
first M
last year month day
16
XML Syntax Rules
All XML Elements Must Have a Closing Tag
l
◼
◼ XML Tags are Case Sensitive
◼
a
<address> is not the same as <Address>
t e
P
◼ XML Elements Must be Properly Nested
a
◼ <name><email>…</name></email> is not allowed.
ic
◼ <name><email>…</email><name> is valid.
◼
o n
XML Documents Must Have a Root Element
XML Attribute Values Must be Quoted
◼
◼ M
Comments in XML: <!-- This is a comment -->
White-space is Preserved in XML
17
The XML Tree Structure
t e l
P a
ic a
o n
M
18
An Example XML document
l
<?xml version="1.0" encoding="UTF-8"?>
e
<bookstore>
t
<book category="cooking">
a
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
P
<year>2005</year>
<price>30.00</price>
a
</book>
ic
<book category="children">
<title lang="en">Harry Potter</title>
n
<author>J K. Rowling</author>
o
<year>2005</year>
<price>29.99</price>
M
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
19
XML Naming Rules
characters
t l
Names can contain letters, numbers, and other
e
◼
character
P a
Names cannot start with a number or punctuation
ic a
Names cannot start with the letters xml (or XML, or
Xml, etc)
◼
o n
Names cannot contain spaces
M
20
Best Naming Practices
◼ Make names descriptive: <first_name>,
t e l
<book_title> not <the_title_of_the_book>.
a
◼
Avoid "-" characters.
P
◼
Avoid "." characters.
a
◼
ic
◼ Avoid ":" characters. Colons are reserved to be used for
something called namespaces
◼
o n
A good practice is to use the naming rules of your
database for the elements in the XML documents.
◼
M
Non-English letters like éòá are perfectly legal in XML,
but watch out for problems if your software vendor
doesn't support them.
21
Building Blocks of XML
◼
e l
Elements (Tags) are the primary components of XML
t
a
documents. Element FNAME nested inside element
P
<AUTHOR id = “123”> Author.
Element <FNAME> JAMES</FNAME>
a
Author with <LNAME> RUSSEL</LNAME>
ic
Attr id </AUTHOR>
n
<!- I am comment ->
◼
o
Attributes provide additional information about
M
Elements. Values of the Attributes are set inside the
Elements
◼ Comments stats with <!- and end with ->
22
What is well-formed XML document?
XML Declaration required in the first line
l
◼
◼ At least one element
– Exactly one root element
a t e
◼
a
– Closing tag (e.g. "<br></br>")P
Empty elements are written in one of two ways:
n ic
– Special start tag (e.g. "<br />")
For non-empty elements, closing tags are required
o
◼
Start tag must match closing tag (name & case)
M
◼
◼ Correct nesting of elements
<author> <firstname>Mark</firstname>
<lastname>Twain</lastname> </author>
◼ Attribute values must always be quoted
<subject scheme=“LCSH”>Music</subject>
23
What is well-formed XML document?
l
◼ Good example
t e
<addressBook>
a
<person>
<name> <family>Wallace</family> <given>Bob</given> </name>
<email>bwallace@[Link]</email>
a
<address>Rue de Lausanne, Genève</address>
P
ic
</person>
</addressBook>
◼ Bad example
<addressBook>
o n
M
<address>Rue de Lausanne, Genève <person></address>
<name>
<family>Schneider</family> <firstName>Nina</firstName>
</name>
<email>nina@[Link]</email>
</person>
<name><family> Muller </family> <name>
</addressBook>
24
Validity
◼
all the XML rules.
t l
A well-formed document has a tree structure and obeys
e
◼
P a
"validity" refers to whether an XML document
conforms to the rules defined by a Document Type
ic a
Definition (DTD) or an XML Schema Definition
n
(XSD).
◼
o
These rules define the structure, content, and format that
◼
M
the XML document must adhere to in order to be
considered valid.
DTDs were developed first, so they are not as
comprehensive as schema.
25
Validity
◼
e l
When an XML document is validated against the DTD
t
a
or XSD:
P
◼ If the document conforms to the rules specified in the DTD or
a
XSD, it is considered valid.
ic
◼ If the document violates any of the rules specified in the DTD
n
or XSD, it is considered invalid.
o
M
26
XML DTD
◼
e l
A DTD is a set of rules that allow us to specify our own
t
a
set of elements and attributes.
P
◼ DTD is grammar to indicate what tags are legal in XML
a
documents.
◼
n ic
XML Document is valid if it has an attached DTD and
o
document is structured according to rules defined in
DTD.
M
27
Document Type Definitions
◼
something about its data.
t e l
A DTD describes the tree structure of a document and
P a
There are two data types, PCDATA and CDATA.
a
◼ PCDATA is parsed character data.
ic
◼ CDATA is character data, not usually parsed.
n
◼ A DTD determines how many times a node may appear,
o
and how child nodes are ordered.
M
28
Document Type Definitions
◼ XML document types can be specified using a DTD
DTD constraints structure of XML data
t e l
a
◼
P
◼ What elements can occur
What attributes can/must an element have
a
◼
ic
◼ What subelements can/must occur inside each element, and how many
times.
◼
◼ n
DTD does not constrain data types
o
All values represented as strings in XML
◼
◼
M
DTD definition syntax
<!ELEMENT element (subelements-specification) >
<!ATTLIST element (attributes) >
◼ … more details later
◼ Valid XML documents refer to a DTD (or other Schema)
29
DTD for address example
e l
<!ELEMENT address (name, email, phone, birthday)>
t
a
<!ELEMENT name (first, last)>
P
<!ELEMENT first (#PCDATA)>
a
<!ELEMENT last (#PCDATA)>
ic
<!ELEMENT email (#PCDATA)>
n
o
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
M
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>
30
DTD Example
<BOOKLIST>
<BOOK GENRE = “Science”
t l
<!DOCTYPE BOOKLIST[
e
<!ELEMENT BOOKLIST(BOOK)*>
a
FORMAT = “Hardcover”> <!ELEMENT BOOK(AUTHOR)>
P
<!ELEMENT
<AUTHOR>
AUTHOR(FIRSTNAME,LASTNAME)>
a
<FIRSTNAME> <!ELEMENT FIRSTNAME(#PCDATA)>
ic
RICHRD <!ELEMENT>LASTNAME(#PCDATA)>
n
</FIRSTNAME> <!ATTLIST BOOK GENRE
o
<LASTNAME> (Science|Fiction)#REQUIRED>
KARTER <!ATTLIST BOOK FORMAT
M
</LASTNAME> (Paperback|Hardcover) “PaperBack”>]>
</AUTHOR>
</BOOK>
</BOOKS>
Xml Document And
Corresponding DTD 31
DTD
◼ DTD make two types of declarations:
t e l
a
◼ Internal and External
◼
a P
If the DTD is declared inside the XML file, it
must be wrapped inside the <!DOCTYPE>
definition.
n ic
o
◼ In the XML file, select "view source" to view the
DTD.
◼
M
If the DTD is declared in an external file, the
<!DOCTYPE> definition must contain a
reference to the DTD file.
32
XML document with an internal DTD
◼ <?xml version="1.0"?>
<!DOCTYPE note [
t e l
<!ELEMENT to (#PCDATA)>
P a
<!ELEMENT note (to,from,heading,body)>
ic a
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
]>
o n
<!ELEMENT body (#PCDATA)>
M
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
33
An External DTD Declaration
◼
<?xml version="1.0"?>
t l
XML document with a reference to an external DTD
e
a
◼
<!DOCTYPE note SYSTEM "[Link]">
P
<note>
a
<to>Tove</to>
ic
<from>Jani</from>
<heading>Reminder</heading>
</note>
o n
<body>Don't forget me this weekend!</body>
the file "[Link]", which contains the DTD:
M
◼
◼ <!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
34
DTD Application
should
l
External Public DTD Declaration know DTD
e
<?xml version="1.0" encoding=“UTF-8"?>
t
<!DOCTYPE test PUBLIC "-//Webster//DTD test V1.0//EN"
test =
a
<test> "test" is a document element </test>
name of
the root
element
a P
External DTD Declaration referring to a file or a URL
ic
<?xml version="1.0" encoding=" UTF-8 "?>
<!DOCTYPE test SYSTEM "[Link]"> DTD is
n
<test> "test" is a document element </test> defined in
o
file [Link]
DTD is
defined
inside XML
M
Internal DTD Declaration
<!DOCTYPE test [
<!ELEMENT test EMPTY> ]>
<test/>
35
XML Schema
◼
e
Schemas are themselves XML documents.
t l
a
◼ They were standardized after DTDs and provide more
P
information about the document.
◼
ic a
They have a number of data types including string,
decimal, integer, boolean, date, and time.
◼
o n
They divide elements into simple and complex types.
M
◼ They also determine the tree structure and how many
children a node may have.
36
What are XML Schemas?
◼ W3C Recommendation, 2 May 2001
– Part 0: Primer
t e l
– Part 1: Structures
– Part 2: Datatypes
P a
◼
ic a
DTDs use a non-XML syntax and have a number of
n
limitations
o
– no namespace support
M
– lack of data-types
◼ XML Schemas are an alternative to DTDs
◼ Used to formally specify a "class" of XML
documents
◼ Supports simple/complex data-types
37
Schema for First address Example
<?xml version="1.0" encoding=“UTF-8" ?>
t e l
<xs:schema xmlns:xs="[Link]
<xs:element name="address">
<xs:complexType>
P a
a
<xs:sequence>
ic
<xs:element name="name" type="xs:string"/>
n
<xs:element name="email" type="xs:string"/>
o
<xs:element name="phone" type="xs:string"/>
M
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
38
XML Schema
e l
◼ RDBMS Schema (s_id integer, s_name string, s_status string)
XMLSchema
t
<Students>
<Student id=“p1”>
<xs:schema>
P a
<xs:complexType name = “StudnetType”>
<Name>Allan</Name>
<Age>62</Age>
a
<xs:attribute name=“id” type=“xs:string” />
ic
<xs:element name=“Name” type=“xs:string />
n
<Email>allan@[Link] <xs:element name=“Age” type=“xs:integer” />
o
</Email> <xs:element name=“Email” type=“xs:string” />
</Student> </xs:complexType>
M
</Students> <xs:element name=“Student”
type=“StudentType” />
</xs:schema>
XML Document and Schema
39
XML Namespaces
◼ Various XML languages can be mixed
t e l
a
◼ However there can be a naming conflict, different
P
vocabularies (DTDs) can use the same names for elements !
How to avoid confusion ?
◼ Namespaces:
ic a
n
◼ Qualify element and attribute names with a label (prefix):
o
unique_prefix:element_name
M
◼ An XML namespace is a collection of names (elements and
attributes of a markup vocabulary)
◼ identified by xmlns:prefix=“URL reference”
xmlns:xlink="[Link]
40
XML Namespaces
◼
◼
W3C recommendation (January 1999)
t e l
Each XML vocabulary is considered to own a namespace in
a
which all elements (and attributes) are unique
◼
multiple namespaces
a P
A single document can use elements and attributes from
ic
– A prefix is declared for each namespace used within a
document.
o
Resource Identifier) n
– The namespace is identified using a URI (Uniform
M
◼ An element or attribute can be associated with a namespace
by placing the namespace prefix before its name (i.e.
'prefix:name’)
– Elements (and attributes) belonging to the default
namespace do not require a prefix
41
Example - XML Namespaces
<?xml version='1.0'?>
St. James’s Hospital
t l
<Accident Report
e
xmlns:sjh="[Link]
a
<!ELEMENT Patient (Name, DOB)> xmlns:dub=[Link] >
<!ELEMENT Name (First, Last)>
P
<sjh:Patient>
<!ELEMENT First (#PCDATA)> <sjh:Name>
a
<!ELEMENT Last (#PCDATA)> <sjh:First>Mike</sjh:First>
ic
<!ELEMENT DOB (#PCDATA)> <sjh:Last>Murphy</sjh:Last>
</sjh:Name>
n
<sjh:DOB>12/12/1950</sjh:DOB>
o
</sjh:Patient>
Airport Pharmacy
M
<!ELEMENT Drug
((Name|Substance), Code)> <dub:Drug>
<!ELEMENT Name (#PCDATA)> <dub:Name>Nurofen</dub:Name>
<!ELEMENT Substance (#PCDATA)> <dub:Code>IE-975-2</dub:Code>
<!ELEMENT Code (#PCDATA)> </dub:Drug>
[...]
</Accident Report>
42
Why Namespace?
◼
different types of data
t l
Important for creating XML documents containing
e
◼
P a
An XML document can be assembled using
elements (and attributes) from different XML
vocabularies
Must be able to
ic a
n
◼
o
– avoid conflicts between names
– identify the vocabulary an element belongs to
M
43
Advantages of XML
◼ XML is text (Unicode) based.
t e l
a
◼ Takes up less space.
P
◼ Can be transmitted efficiently.
◼
different media.
ic a
One XML document can be displayed differently in
◼
o n
Html, video, CD, DVD,
You only have to change the XML document in order to
◼ M
change all the rest.
XML documents can be modularized. Parts can be
reused.
44
Disadvantages of XML
◼
e l
XML syntax is redundant or relatively large
t
a
◼ Supports only the text string data type
◼
a P
More difficult ,demanding and precise than HTML.
ic
◼ Lack of browser support / end user applications.
n
◼ Still experimental / not solidified.
o
M
45