CPE5009 Internet Devices and Services
XML and XML Schema
– A Refresher
[Link]
Lecture Outline
• An Overview of XML
• XML Namespaces
• XML Schema
Unit intro: CSE5610 Intelligent Software Systems S1 2008 2
Web Service Definitions
When is an application called a web service?
UDDI consortium:
“self-contained, modular business
applications that have open, Internet-
oriented, standards-based interfaces”
Unit intro: CSE5610 Intelligent Software Systems S1 2008 3
Web Service Definitions
when is an application called a web service?
World Wide Web consortium (W3C):
“a software application identified by a URI,
whose interfaces and bindings are capable of
being defined, described, and discovered as
XML artifacts.
A web service supports direct interactions
with other software agents using XML-based
messages exchanged via Internet-based
protocols.”
Unit intro: CSE5610 Intelligent Software Systems S1 2008 4
Web Service Definitions
when is an application called a web service?
Webopedia:
“ a standardized way of integrating Web-based
applications using the XML, SOAP, WSDL and
UDDI open standards over an Internet protool
backbone.
XML is used to tag the data, SOAP is used to
transfer the date, WSDL is used for
describing the services available, and UDDI is
used for listing what services are available.”
Unit intro: CSE5610 Intelligent Software Systems S1 2008 5
Role of XML in Web Services
W3C: “interfaces and bindings are … defined,
described, and discovered as XML…” and
“...interactions...using XML-based
messages...”
XML: extensible markup language
Used in Web Services for two purposes:
• service description - using WSDL
• message format – using SOAP
Unit intro: CSE5610 Intelligent Software Systems S1 2008 6
XML 101
• XML stands for EXtensible Markup Language
• XML is important to know, and very easy to learn
• XML is a markup language much like HTML
• XML was designed to transport and store data
– XML was designed to carry data, not to display data
– XML is a software and hardware independent tool for
carrying information
• XML tags are not predefined. You must define your own
tags
• XML is designed to be self-descriptive
• XML is a W3C Recommendation
• XML is everywhere - most common tool for data
transmissions between all sorts of applications, and
becomes more and more popular in the area of storing
and describing information.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 7
XML History
1986: SGML: Standard Generalized Markup
Language (ISO), work started in 70’s
1992: HTML from CERN (late 80’s), an application
of SGML for rendering web pages (W3C)
1998: XML, subset of SGML, easier but still
powerful, and can be rendered like HTML (W3C
Recommendation)
Unit intro: CSE5610 Intelligent Software Systems S1 2008 8
XML Vs. HTML
• XML is not a replacement for HTML.
XML and HTML were designed with
different goals:
• XML was designed to transport and store
data, with focus on what data is.
HTML was designed to display data, with
focus on how data looks.
• HTML is about displaying information,
while XML is about carrying information
Unit intro: CSE5610 Intelligent Software Systems S1 2008 9
XML is Mere Text
• XML Does not DO Anything
• XML was created to structure, store, and transport information.
• The following example is a note to Tove from Jani, stored as XML:
<note>
<to> Tove </to>
<from> Jani </from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
• The note above is quite self descriptive.
• XML document does not DO anything
• pure information wrapped in tags.
• Someone must write a piece of software to send, receive or display
it.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 10
Define Your Own Tags
• The tags in the example above (like <to> and <from>) are
not defined in any XML standard. These tags are "invented"
by the author of the XML document.
• That is because the XML language has no predefined tags.
• The tags used in HTML (and the structure of HTML) are
predefined. HTML documents can only use tags defined in
the HTML standard (like <p>, <h1>, etc.).
• XML allows the author to define his own tags and his own
document structure.
• XML-aware applications can handle the XML tags specially.
The functional meaning of the tags depends on the nature of
the application.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 11
XML Separates Data From HTML
• If you need to display dynamic data in your
HTML document, it will take a lot of work to edit
the HTML each time the data changes.
• With XML, data can be stored in separate XML
files.
• This way you can concentrate on using HTML
for layout and display, and be sure that changes
in the underlying data will not require any
changes to the HTML.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 12
XML Simplifies Data Sharing
• In the real world, computer systems and
databases contain data in incompatible
formats.
• XML data is stored in plain text format. This
provides a software- and hardware-
independent way of storing data.
• This makes it much easier to create data that
different applications can share.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 13
XML Simplifies Data Transport
• In the real world, computer systems and
databases contain data in incompatible
formats.
• XML data is stored in plain text format. This
provides a software- and hardware-
independent way of storing data.
• This makes it much easier to create data that
different applications can share.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 14
XML - Create New Internet Languages
• XML is Used to Create New Internet Languages
• Here are some examples:
• XHTML
• WSDL for describing available web services
• WAP and WML as markup languages for handheld
devices
• RSS languages for news feeds
• RDF and OWL for describing resources and ontology
• SMIL for describing multimedia for the web
• Domain Specific Markup Languages – Chemical,
Medical, Mathematical etc.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 15
XML Document Structures
• XML documents form a tree structure that
XML Declaration –
starts at "the root" and branchesVersion
to "theand Encoding
leaves“Root
Element
<?xml version="1.0" encoding="ISO-8859-1"?>
<note> Child Element 1
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
End Root
Element
Unit intro: CSE5610 Intelligent Software Systems S1 2008 16
XML Document Structure
• XML documents must contain a root element. This element is
"the parent" of all other elements.
• The elements in an XML document form a document tree. The
tree starts at the root and branches to the lowest level of the tree.
All elements can have sub elements (child elements):
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
• The terms parent, child, and sibling are used to describe the
relationships between elements. Parent elements have children.
Children on the same level are called siblings (brothers or
sisters).
• All elements can have text content and attributes (just like in
HTML).
Unit intro: CSE5610 Intelligent Software Systems S1 2008 17
Another Example
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 18
XML Syntax Rules
• All XML Elements Must Have a Closing Tag
• XML Tags are Case Sensitive
• XML Elements Must be properly nested
• XML Documents Must Have a Root Element
• XML Attribute Values Must be Quoted
– XML elements can have attributes in name/value pairs just like in
HTML.
– In XML the attribute value must always be quoted.
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 19
XML Syntax Rules – Entity References
• Some characters have a special meaning in XML.
• If you place a character like "<" inside an XML element, it will generate an
error because the parser interprets it as the start of a new element.
• This will generate an XML error:
– <message>if salary < 1000 then</message>
• To avoid this error, replace the "<" character with an entity reference:
• <message>if salary < 1000 then</message>
• There are 5 predefined entity references in XML:
• < < less than
• > > greater than
• & & ampersand
• ' ‘ apostrophe
• " “ quotation mark
• Note: Only the characters "<" and "&" are strictly illegal in XML. The
greater than character is legal, but it is a good habit to replace it.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 20
XML Syntax Rules – Comments, Whitespace, LF
• Comments in XML is similar to that of HTML
– <!-- This is a comment -->
• HTML reduces multiple white space characters to
a single white space, in XML the white space in
your document is not truncated
• XML Stores New Line as LF
– Windows applications, a new line is normally stored as a pair
of characters: carriage return (CR) and line feed (LF)
– Unix applications, a new line is normally stored as a LF
character
– Macintosh applications use only a CR character to store a
new line.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 21
XML Elements
• An XML document contains XML Elements.
• An XML element is everything from
(including) the element's start tag to
(including) the element's end tag.
• An element can contain other elements,
simple text or a mixture of both.
• Elements can also have attributes.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 22
XML Elements – Naming Rules
• XML Naming Rules
• XML elements must follow these naming
rules:
– Names can contain letters, numbers, and other
characters
– Names must not start with a number or
punctuation character
– Names must not start with the letters xml (or XML,
or Xml, etc)
– Names cannot contain spaces
• Any name can be used, no words are
reserved.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 23
XML Elements are Extensible
• XML elements can be extended to carry more information
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
• The application should still be able to find the <to>,
<from>, and <body> elements in the XML document and
produce the same output.
• One of the beauties of XML, is that it can often be extended
without breaking applications.a
Unit intro: CSE5610 Intelligent Software Systems S1 2008 24
XML Attributes
• XML elements can have attributes in the
start tag, just like HTML.
• Attributes provide additional information
about elements.
• From HTML : <img src="[Link]">.
• Attribute values must always be enclosed in
quotes, but either single or double quotes
can be used.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 25
Attribute Vs Element
Example 1:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
Example 2:
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
• Design decision – no rules as to when to use Attributes Vs. Elements
Unit intro: CSE5610 Intelligent Software Systems S1 2008 26
Viewing XML Files
• XML Files can be viewed in any browser
– Example – [Link] and note_error.xml
• Formatting – Range of ways
– Cascading Style Sheets
– XSLT (W3C Recommendation)
• Example using CSS
– cd_catalog.xml
– cd_catalog.css
– cd_catalog_with_css.xml
Unit intro: CSE5610 Intelligent Software Systems S1 2008 27
XML Namespaces
• XML Namespaces provide a method to avoid element
name conflicts.
• Element names are defined by the developer.
• This often results in a conflict when trying to mix XML
documents from different XML applications.
Example: This XML carries HTML table information
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 28
XML Namespaces
EXAMPLE: This XML contains information about furniture
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
• If these XML fragments were added together, there would
be a name conflict. Both contain a <table> element, but
the elements have different content and meaning.
• An XML parser will not know how to handle these
differences.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 29
XML Namespaces
• Name conflicts in XML can easily be avoided using a name prefix.
• This XML carries information about an HTML table, and a piece of
furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
• There will be no conflict because the two <table> elements have
different names.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 30
XML Namespaces – xmlns Attribute
• When using prefixes in XML, a so-called namespace for the prefix must be
defined.
• The namespace is defined by the xmlns attribute in the start tag of an
element.
• The namespace declaration has the following syntax: xmlns:prefix="URI"
<root>
<h:table xmlns:h="[Link]
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="[Link]
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
• The xmlns attribute in the <table> tag give the h: and f:
prefixes a qualified namespace.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 31
XML Namespaces – xmlns Attribute
• When a namespace is defined for an element, all child elements with the same
prefix are associated with the same namespace
• Namespaces can be declared in the elements where they are used or in the XML
root element
<root
xmlns:h="[Link]
xmlns:f="[Link]
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 32
XML Namespaces
• The namespace URI is not used by the parser to look up
information.
• The purpose is to give the namespace a unique name.
• However, often companies use the namespace as a
pointer to a web page containing namespace information.
• Uniform Resource Identifier (URI)
• A Uniform Resource Identifier (URI) is a string of
characters which identifies an Internet Resource.
• The most common URI is the Uniform Resource Locator
(URL) which identifies an Internet domain address.
Another, not so common type of URI is the Universal
Resource Name (URN).
Unit intro: CSE5610 Intelligent Software Systems S1 2008 33
XML Namespaces – Default Namespaces
• Defining a default namespace for an element saves us from using prefixes in all
the child elements. It has the following syntax:
• xmlns="namespaceURI“
• This XML carries HTML table information:
<table xmlns="[Link]
<tr>
<td>Apples</td> <td>Bananas</td>
</tr>
</table>
• This XML carries information about a piece of furniture:
<table
xmlns="[Link]
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 34
XML Validation
• XML with correct syntax is "Well Formed" XML
• A "Valid" XML document is a "Well Formed" XML
document, which also conforms to the rules of a
Document Type Definition (DTD) / XML Schema
• The purpose of a DTD is to define the structure of an
XML document. It defines the structure with a list of
legal elements:
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 35
XML Schema
• W3C supports an XML based alternative to DTD called XML
Schema:
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 36
XML Schema
• XML Schema is an XML-based alternative to
DTDs.
• An XML Schema describes the structure of an
XML document.
• The XML Schema language is also referred to
as XML Schema Definition (XSD).
Unit intro: CSE5610 Intelligent Software Systems S1 2008 37
XML Schema
• The purpose of an XML Schema is to define the legal
building blocks of an XML document, just like a DTD.
• An XML Schema:
– defines elements that can appear in a document
– defines attributes that can appear in a document
– defines which elements are child elements
– defines the order of child elements
– defines the number of child elements
– defines whether an element is empty or can include text
– defines data types for elements and attributes
– defines default and fixed values for elements and attributes
• XML Schema is a W3C Standard
Unit intro: CSE5610 Intelligent Software Systems S1 2008 38
Why Use XML Schemas
•XML Schemas Support Data Types
•One of the greatest strength of XML Schemas is the
support for data types.
•With support for data types:
•It is easier to describe allowable document content
•It is easier to validate the correctness of data
•It is easier to work with data from a database
•It is easier to define data facets (restrictions on
data)
•It is easier to define data patterns (data formats)
•It is easier to convert data between different data
types
Unit intro: CSE5610 Intelligent Software Systems S1 2008 39
Why Use XML Schemas
• XML Schemas use XML Syntax
• Another great strength about XML Schemas is that
they are written in XML.
• Some benefits of that XML Schemas are written in
XML:
– You don't have to learn a new language
– You can use your XML editor to edit your Schema files
– You can use your XML parser to parse your Schema files
– You can manipulate your Schema with the XML DOM
– You can transform your Schema with XSLT
Unit intro: CSE5610 Intelligent Software Systems S1 2008 40
Why Use XML Schemas
• XML Schemas Secure Data Communication
• When sending data from a sender to a receiver, it is essential
that both parts have the same "expectations" about the
content.
• With XML Schemas, the sender can describe the data in a
way that the receiver will understand.
• A date like: "03-11-2004" will, in some countries, be
interpreted as [Link] and in other countries as
[Link].
• However, an XML element with a data type like this:
<date type="date">2004-03-11</date>
ensures a mutual understanding of the content, because the
XML data type "date" requires the format "YYYY-MM-DD".
Unit intro: CSE5610 Intelligent Software Systems S1 2008 41
Why Use XML Schemas
• XML Schemas are Extensible
• XML Schemas are extensible, because
they are written in XML.
• With an extensible Schema definition you
can:
• Reuse your Schema in other Schemas
• Create your own data types derived from
the standard types
• Reference multiple schemas in the same
document
Unit intro: CSE5610 Intelligent Software Systems S1 2008 42
How To Write an XML Schema
• Consider the following XML document
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 43
How To Write an XML Schema
• The following example is an XML Schema file called "[Link]" that defines
the elements of the XML document above ("[Link]")
<?xml version="1.0"?>
<xs:schema xmlns:xs="[Link]
targetNamespace="[Link]
xmlns="[Link]
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 44
Specifies the default namespace
How To Reference and XML Schema declaration. This declaration tells
the schema-validator that all the
• This XML document has a reference to elements used in this XML
an XML Schema:
document are declared in the
<?xml version="1.0"?> "[Link]
<note namespace
xmlns="[Link]
xmlns:xsi="[Link]
xsi:schemaLocation="[Link] [Link]">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 45
XML Schema – The schema Element
• The <schema> element is the root element of every XML Schema:
<?xml version="1.0"?>
<xs:schema>
...
...
</xs:schema>
• The <schema> element may contain some attributes. A
schema declaration often looks something like this:
<?xml version="1.0"?>
<xs:schema xmlns:xs="[Link]
targetNamespace="[Link]
xmlns="[Link]
elementFormDefault="qualified">
...
...
</xs:schema>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 46
XML Schema – The schema Element
• This fragment:
targetNamespace="[Link]
• indicates that the elements defined by this schema
(note, to, from, heading, body.) come from the
"[Link] namespace.
• This fragment:
xmlns="[Link]
indicates that the default namespace is
"[Link]
• This fragment:
elementFormDefault="qualified"
indicates that any elements used by the XML instance
document which were declared in this schema must be
namespace qualified.
Unit intro: CSE5610 Intelligent Software Systems S1 2008 47
XML Schema – Simple and Complex Types
Two different types for an element or attribute:
• simple types: no child elements and no attributes
• complex types: all other elements
Two content type of an element:
• empty content: no child element nor text node
• simple content: only text nodes
• complex content: only sub-elements
• mixed content: both
i.e., a complex type either has complex content, or simple
content with attributes
Unit intro: CSE5610 Intelligent Software Systems S1 2008 48
XML Schema
1. complex content, and 2. simple content and
thus complex type simple type
<author id="CMS"> <born>
<name> 1922-11-26
Charles M Schulz </born>
</name>
<born>
3. simple content but
1922-11-26
</born>
complex type
<title lang="en">
<dead>
Being a Dog Is a Full-
2000-02-12
Time Job
</dead>
</title>
</author>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 49
XML Schema types
Unit intro: CSE5610 Intelligent Software Systems S1 2008 50
XML Schema
all the simple types:
<xs:element name="name" type="xs:string" />
<xs:element name="qualification"
type="xs:string" />
<xs:element name="born" type="xs:date" />
<xs:element name="dead" type="xs:date" />
<xs:element name="isbn" type="xs:integer" />
<xs:attribute name="id" type="xs:ID" />
<xs:attribute name="available"
type="xs:boolean" />
<xs:attribute name="lang" type="xs:language"
/>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 51
XML Schema
a complex type,
<xs:element name="title">
simple content
<xs:complexType>
<xs:simpleContent>
<title
lang="en"> <xs:extension
base="xs:string">
Being a Dog Is
a Full-Time <xs:attribute
Job ref="lang" />
</title> </xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 52
XML Schema
a complex type, with complex content:
<xs:element name="library">
<xs:complexType>
<xs:sequence>
<xs:element ref="book“
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Unit intro: CSE5610 Intelligent Software Systems S1 2008 53
References
• [Link]
• [Link]
• [Link]
[Link]
• Lecture Notes from Aad Van Moorsel -
[Link]
Unit intro: CSE5610 Intelligent Software Systems S1 2008 54