0% found this document useful (0 votes)
8 views45 pages

U 2XML

The document provides an introduction to XML (eXtensible Markup Language), highlighting its purpose for data transport and storage, as well as its differences from HTML. It covers XML syntax rules, structure, and the importance of well-formed documents, including the use of Document Type Definitions (DTD) for validation. Additionally, it explains XML's flexibility in allowing users to define their own tags and the tree structure of XML documents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

U 2XML

The document provides an introduction to XML (eXtensible Markup Language), highlighting its purpose for data transport and storage, as well as its differences from HTML. It covers XML syntax rules, structure, and the importance of well-formed documents, including the use of Document Type Definitions (DTD) for validation. Additionally, it explains XML's flexibility in allowing users to define their own tags and the tree structure of XML documents.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to XML

12-Apr-24
Topics
◼ Introduction

t e l
a
◼ XML versus HTML
◼ XML terminologies

a P
ic
◼ XML standards (XML namespace, XML schema)

o n
M
2
Introduction

e l
XML stands for eXtensible Markup Language.

t
a
◼ XML is designed to transport and store data.

a P
XML is important to know, and very easy to learn.

ic
◼ XML is markup language much like HTML.

n
◼ XML was designed to carry data, not to display data.

o
Tags are added to the document to provide the extra

M
information.

3
Introduction

e l
XML is a simplified subset of SGML (Standard

t
a
Generalised Mark-Up Language)

P
◼ XML and HTML have a similar syntax…

a
both derived from SGML.

n ic
Officially recommended by W3C since 1998.

o
◼ Primarily created by Jon Bosak of Sun Microsystems.

M
4
Why XML is used?

e l
XML documents are used to transfer data from one

t
a
place to another often over the internet.

P
◼ XML is used in many aspects of web development.

ic a
XML is often used to separate data from presentation.
In many HTML applications,
n

o
XML is used to store or transport data, while
HTML is used to format and display the same data.
M
5
Difference between XML and HTML

◼ XML is not a replacement for HTML.


Different goals:
t e l
a

◦ XML was designed to transport and store

P
data, with focus on what data is

a
ic
◦ HTML was designed to display data, with focus
on how data looks

o n
HTML is about displaying information.
XML is about carrying information.

M

6
HTML vs XML

⚫ Fixed set of tags ⚫


t e l
Extensible set of tags
⚫ Presentation oriented ⚫

P a
Content orientated

a
⚫ No data validation ⚫ Standard Data

ic
capabilities infrastructure
⚫ Tags are not case
sensitive
o n ⚫ Tags are case sensitive


M
Tags are used for display. ⚫ Tags are used to describe
documents and data.

7
XML File
◼ How to write and store XML file?

t e l
As you did before with CSS and JavaScript files.

a

By using text file in different extension:

P

◦ .css for CSS file

ic
◦ .js for JavaScript
a

o n
Then, .xml for XML file.

M
8
The Basic Rules
◼ XML is case sensitive

t e l
a
◼ All start tags must have end tags

a P
Elements must be properly nested
XML declaration is the first statement

ic

Every document must contain a root element


n


o
Attribute values must have quotation marks

M
Certain characters are reserved for parsing

9
XML Declaration
◼ Placed at the start of an
XML document

t e l
<?xml version="1.0" ?>

a
◼ Informs XML software of

P
– the version of XML the
<?xml
document conforms to

a
version="1.0"

ic
– the character encoding
scheme used in the
encoding="UTF-8" ?>
document

o n
– whether or not a set of <?xml
version="1.0"

M
external declarations
affect the interpretation of encoding="UTF-8"
this document standalone="yes" ?>

10
XML structure
◼ Look at the following student ID.
Student Identification
t e l
ID Number: 1
P a
Name: Rana Jawad
BOD: 9/9/1999
ic a
o n
Issuing Date: 10/4/2011

M
◼ Think about the main items of this ID!!!!
◼ Which items are constant and which variables?

11
XML structure - Tags


Lets put them inside suited tags:
What about the next student?

t e l
<studentID>
<IdNumber>1</IdNumber>
P a
a
<Name>Rana Jawad</Name>

ic
<BOD>9/9/1999</BOD>

n
<IssueDate>10/4/2011</IssueDate>
</studentID>
<studentID> o
M
<IdNumber>2</IdNumber>
<Name>Ahmed Sameer</Name>
<BOD>3/3/1998</BOD>
<IssueDate>10/4/2011</IssueDate>
</studentID>
12
XML Does Not Do Anything

does not DO anything.


t l
Maybe it is a little hard to understand, but XML

e

information.
P a
XML was created to structure, store, and transport

ic a
The previous example is a student ID, stored as
XML:

o n
It is quite self descriptive.
But still, this XML document does not DO anything.

M

◼ It is just information wrapped in tags.


◼ Someone must write a piece of software to send,
receive or display it.

13
With XML you invent your Own tags

t l
The tags in the example above (like <Name> and

e
<BOD>) are not defined in any XML standard.

XML document.
P a
These tags are "invented" by the author of the

ic a
That is because the XML language has no
predefined tags.

o n
The tags used in HTML are predefined.
HTML documents can only use tags defined in the

M

HTML standard (like <p>, <h1>, etc.).


◼ XML allows the author to define his/her own tags
and his/her own document structure.

14
XML Documents Form a Tree Structure

l
◼ XML documents must contain a root element.

the tree.

a e
The tree starts at the root and branches to the lowest level of

t
All elements can have sub elements (child elements):

P

<root>
<child>

ic a
<subchild>.....</subchild>

</root>
o n
</child>


M
The terms parent, child, and sibling are used to describe the
relationships between elements.
◼ Children on the same level are called siblings.
◼ All elements can have text content and attributes (just like in
HTML).
15
XML structure - Tree
◼ XML Files are Trees

t e l
a
◼ An XML document has a single root node.
◼ Preorder traversal are usually used.

a P
n ic
address

name
o
email phone birthday

first M
last year month day

16
XML Syntax Rules
All XML Elements Must Have a Closing Tag

l

◼ XML Tags are Case Sensitive


a
<address> is not the same as <Address>
t e
P
◼ XML Elements Must be Properly Nested

a
◼ <name><email>…</name></email> is not allowed.

ic
◼ <name><email>…</email><name> is valid.


o n
XML Documents Must Have a Root Element
XML Attribute Values Must be Quoted

◼ M
Comments in XML: <!-- This is a comment -->
White-space is Preserved in XML

17
The XML Tree Structure

t e l
P a
ic a
o n
M
18
An Example XML document

l
<?xml version="1.0" encoding="UTF-8"?>

e
<bookstore>

t
<book category="cooking">

a
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>

P
<year>2005</year>
<price>30.00</price>

a
</book>

ic
<book category="children">
<title lang="en">Harry Potter</title>

n
<author>J K. Rowling</author>

o
<year>2005</year>
<price>29.99</price>

M
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
19
XML Naming Rules

characters
t l
Names can contain letters, numbers, and other

e

character
P a
Names cannot start with a number or punctuation

ic a
Names cannot start with the letters xml (or XML, or
Xml, etc)

o n
Names cannot contain spaces

M
20
Best Naming Practices
◼ Make names descriptive: <first_name>,

t e l
<book_title> not <the_title_of_the_book>.

a

Avoid "-" characters.

P

Avoid "." characters.

a

ic
◼ Avoid ":" characters. Colons are reserved to be used for
something called namespaces

o n
A good practice is to use the naming rules of your
database for the elements in the XML documents.

M
Non-English letters like éòá are perfectly legal in XML,
but watch out for problems if your software vendor
doesn't support them.

21
Building Blocks of XML

e l
Elements (Tags) are the primary components of XML

t
a
documents. Element FNAME nested inside element

P
<AUTHOR id = “123”> Author.
Element <FNAME> JAMES</FNAME>

a
Author with <LNAME> RUSSEL</LNAME>

ic
Attr id </AUTHOR>

n
<!- I am comment ->


o
Attributes provide additional information about

M
Elements. Values of the Attributes are set inside the
Elements
◼ Comments stats with <!- and end with ->

22
What is well-formed XML document?
XML Declaration required in the first line

l

◼ At least one element


– Exactly one root element
a t e

a
– Closing tag (e.g. "<br></br>")P
Empty elements are written in one of two ways:

n ic
– Special start tag (e.g. "<br />")
For non-empty elements, closing tags are required

o

Start tag must match closing tag (name & case)

M

◼ Correct nesting of elements


<author> <firstname>Mark</firstname>
<lastname>Twain</lastname> </author>
◼ Attribute values must always be quoted
<subject scheme=“LCSH”>Music</subject>
23
What is well-formed XML document?

l
◼ Good example

t e
<addressBook>

a
<person>
<name> <family>Wallace</family> <given>Bob</given> </name>
<email>bwallace@[Link]</email>

a
<address>Rue de Lausanne, Genève</address>
P
ic
</person>
</addressBook>

◼ Bad example
<addressBook>

o n
M
<address>Rue de Lausanne, Genève <person></address>
<name>
<family>Schneider</family> <firstName>Nina</firstName>
</name>
<email>nina@[Link]</email>
</person>
<name><family> Muller </family> <name>
</addressBook>
24
Validity

all the XML rules.
t l
A well-formed document has a tree structure and obeys
e

P a
"validity" refers to whether an XML document
conforms to the rules defined by a Document Type

ic a
Definition (DTD) or an XML Schema Definition

n
(XSD).

o
These rules define the structure, content, and format that


M
the XML document must adhere to in order to be
considered valid.
DTDs were developed first, so they are not as
comprehensive as schema.

25
Validity

e l
When an XML document is validated against the DTD

t
a
or XSD:

P
◼ If the document conforms to the rules specified in the DTD or

a
XSD, it is considered valid.

ic
◼ If the document violates any of the rules specified in the DTD

n
or XSD, it is considered invalid.

o
M
26
XML DTD

e l
A DTD is a set of rules that allow us to specify our own

t
a
set of elements and attributes.

P
◼ DTD is grammar to indicate what tags are legal in XML

a
documents.

n ic
XML Document is valid if it has an attached DTD and

o
document is structured according to rules defined in
DTD.

M
27
Document Type Definitions

something about its data.
t e l
A DTD describes the tree structure of a document and

P a
There are two data types, PCDATA and CDATA.

a
◼ PCDATA is parsed character data.

ic
◼ CDATA is character data, not usually parsed.

n
◼ A DTD determines how many times a node may appear,

o
and how child nodes are ordered.

M
28
Document Type Definitions
◼ XML document types can be specified using a DTD
DTD constraints structure of XML data
t e l
a

P
◼ What elements can occur
What attributes can/must an element have

a

ic
◼ What subelements can/must occur inside each element, and how many
times.

◼ n
DTD does not constrain data types

o
All values represented as strings in XML


M
DTD definition syntax
<!ELEMENT element (subelements-specification) >
<!ATTLIST element (attributes) >
◼ … more details later
◼ Valid XML documents refer to a DTD (or other Schema)
29
DTD for address example

e l
<!ELEMENT address (name, email, phone, birthday)>

t
a
<!ELEMENT name (first, last)>

P
<!ELEMENT first (#PCDATA)>

a
<!ELEMENT last (#PCDATA)>

ic
<!ELEMENT email (#PCDATA)>

n
o
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>

M
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>

30
DTD Example
<BOOKLIST>
<BOOK GENRE = “Science”

t l
<!DOCTYPE BOOKLIST[

e
<!ELEMENT BOOKLIST(BOOK)*>

a
FORMAT = “Hardcover”> <!ELEMENT BOOK(AUTHOR)>

P
<!ELEMENT
<AUTHOR>
AUTHOR(FIRSTNAME,LASTNAME)>

a
<FIRSTNAME> <!ELEMENT FIRSTNAME(#PCDATA)>

ic
RICHRD <!ELEMENT>LASTNAME(#PCDATA)>

n
</FIRSTNAME> <!ATTLIST BOOK GENRE

o
<LASTNAME> (Science|Fiction)#REQUIRED>
KARTER <!ATTLIST BOOK FORMAT

M
</LASTNAME> (Paperback|Hardcover) “PaperBack”>]>
</AUTHOR>
</BOOK>
</BOOKS>

Xml Document And


Corresponding DTD 31
DTD
◼ DTD make two types of declarations:

t e l
a
◼ Internal and External

a P
If the DTD is declared inside the XML file, it
must be wrapped inside the <!DOCTYPE>
definition.

n ic
o
◼ In the XML file, select "view source" to view the

DTD.

M
If the DTD is declared in an external file, the
<!DOCTYPE> definition must contain a
reference to the DTD file.

32
XML document with an internal DTD
◼ <?xml version="1.0"?>
<!DOCTYPE note [
t e l
<!ELEMENT to (#PCDATA)>
P a
<!ELEMENT note (to,from,heading,body)>

ic a
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>

]>
o n
<!ELEMENT body (#PCDATA)>

M
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
33
An External DTD Declaration

<?xml version="1.0"?>
t l
XML document with a reference to an external DTD

e
a

<!DOCTYPE note SYSTEM "[Link]">

P
<note>

a
<to>Tove</to>

ic
<from>Jani</from>
<heading>Reminder</heading>

</note>
o n
<body>Don't forget me this weekend!</body>

the file "[Link]", which contains the DTD:


M

◼ <!ELEMENT note (to,from,heading,body)>


<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
34
DTD Application
should

l
External Public DTD Declaration know DTD

e
<?xml version="1.0" encoding=“UTF-8"?>

t
<!DOCTYPE test PUBLIC "-//Webster//DTD test V1.0//EN"
test =

a
<test> "test" is a document element </test>
name of
the root
element
a P
External DTD Declaration referring to a file or a URL

ic
<?xml version="1.0" encoding=" UTF-8 "?>
<!DOCTYPE test SYSTEM "[Link]"> DTD is

n
<test> "test" is a document element </test> defined in

o
file [Link]

DTD is
defined
inside XML
M
Internal DTD Declaration
<!DOCTYPE test [
<!ELEMENT test EMPTY> ]>
<test/>

35
XML Schema

e
Schemas are themselves XML documents.

t l
a
◼ They were standardized after DTDs and provide more

P
information about the document.

ic a
They have a number of data types including string,
decimal, integer, boolean, date, and time.

o n
They divide elements into simple and complex types.

M
◼ They also determine the tree structure and how many
children a node may have.

36
What are XML Schemas?
◼ W3C Recommendation, 2 May 2001
– Part 0: Primer
t e l
– Part 1: Structures
– Part 2: Datatypes
P a

ic a
DTDs use a non-XML syntax and have a number of

n
limitations

o
– no namespace support

M
– lack of data-types
◼ XML Schemas are an alternative to DTDs
◼ Used to formally specify a "class" of XML
documents
◼ Supports simple/complex data-types
37
Schema for First address Example
<?xml version="1.0" encoding=“UTF-8" ?>

t e l
<xs:schema xmlns:xs="[Link]
<xs:element name="address">
<xs:complexType>
P a
a
<xs:sequence>

ic
<xs:element name="name" type="xs:string"/>

n
<xs:element name="email" type="xs:string"/>

o
<xs:element name="phone" type="xs:string"/>

M
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

38
XML Schema

e l
◼ RDBMS Schema (s_id integer, s_name string, s_status string)
XMLSchema

t
<Students>
<Student id=“p1”>
<xs:schema>

P a
<xs:complexType name = “StudnetType”>
<Name>Allan</Name>
<Age>62</Age>
a
<xs:attribute name=“id” type=“xs:string” />

ic
<xs:element name=“Name” type=“xs:string />

n
<Email>allan@[Link] <xs:element name=“Age” type=“xs:integer” />

o
</Email> <xs:element name=“Email” type=“xs:string” />
</Student> </xs:complexType>

M
</Students> <xs:element name=“Student”
type=“StudentType” />
</xs:schema>

XML Document and Schema

39
XML Namespaces
◼ Various XML languages can be mixed

t e l
a
◼ However there can be a naming conflict, different

P
vocabularies (DTDs) can use the same names for elements !
How to avoid confusion ?
◼ Namespaces:
ic a
n
◼ Qualify element and attribute names with a label (prefix):

o
unique_prefix:element_name

M
◼ An XML namespace is a collection of names (elements and
attributes of a markup vocabulary)
◼ identified by xmlns:prefix=“URL reference”
xmlns:xlink="[Link]

40
XML Namespaces


W3C recommendation (January 1999)

t e l
Each XML vocabulary is considered to own a namespace in

a
which all elements (and attributes) are unique

multiple namespaces

a P
A single document can use elements and attributes from

ic
– A prefix is declared for each namespace used within a
document.

o
Resource Identifier) n
– The namespace is identified using a URI (Uniform

M
◼ An element or attribute can be associated with a namespace
by placing the namespace prefix before its name (i.e.
'prefix:name’)
– Elements (and attributes) belonging to the default
namespace do not require a prefix

41
Example - XML Namespaces
<?xml version='1.0'?>

St. James’s Hospital

t l
<Accident Report

e
xmlns:sjh="[Link]

a
<!ELEMENT Patient (Name, DOB)> xmlns:dub=[Link] >
<!ELEMENT Name (First, Last)>

P
<sjh:Patient>
<!ELEMENT First (#PCDATA)> <sjh:Name>

a
<!ELEMENT Last (#PCDATA)> <sjh:First>Mike</sjh:First>

ic
<!ELEMENT DOB (#PCDATA)> <sjh:Last>Murphy</sjh:Last>
</sjh:Name>

n
<sjh:DOB>12/12/1950</sjh:DOB>

o
</sjh:Patient>
Airport Pharmacy

M
<!ELEMENT Drug
((Name|Substance), Code)> <dub:Drug>
<!ELEMENT Name (#PCDATA)> <dub:Name>Nurofen</dub:Name>
<!ELEMENT Substance (#PCDATA)> <dub:Code>IE-975-2</dub:Code>
<!ELEMENT Code (#PCDATA)> </dub:Drug>
[...]
</Accident Report>
42
Why Namespace?

different types of data


t l
Important for creating XML documents containing

e

P a
An XML document can be assembled using
elements (and attributes) from different XML
vocabularies
Must be able to
ic a
n

o
– avoid conflicts between names
– identify the vocabulary an element belongs to

M
43
Advantages of XML
◼ XML is text (Unicode) based.

t e l
a
◼ Takes up less space.

P
◼ Can be transmitted efficiently.

different media.
ic a
One XML document can be displayed differently in


o n
Html, video, CD, DVD,
You only have to change the XML document in order to

◼ M
change all the rest.
XML documents can be modularized. Parts can be
reused.

44
Disadvantages of XML

e l
XML syntax is redundant or relatively large

t
a
◼ Supports only the text string data type

a P
More difficult ,demanding and precise than HTML.

ic
◼ Lack of browser support / end user applications.

n
◼ Still experimental / not solidified.

o
M
45

You might also like