ISOM
Standards in Information
Management: XML
Arijit Sengupta
Learning Objectives
ISOM
• Learn what XML is
• Learn the various ways in which
XML is used
• Learn the key companion
technologies
• See how XML is being used in
industry as a meta-language
Agenda
ISOM
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Overview
What is XML?
ISOM
• A tag-based meta language
• Designed for structured data representation
• Represents data hierarchically (in a tree)
• Provides context to data (makes it meaningful)
Self-describing data
• Separates presentation (HTML) from data (XML)
• An open W3C standard
• A subset of SGML
vs. HTML, which is an implementation of SGML
Overview
What is XML?
ISOM
• XML is a “use everywhere” data
specification
XML XML
Application X
XML XML Configuration
Documents
Repository Database
Overview
Documents vs. Data
ISOM
• XML is used to represent two main
types of things:
Documents
• Lots of text with tags to identify and
annotate portions
of the document
Data
• Hierarchical data structures
Overview
XML and Structured Data
ISOM
• Pre-XML representation of data:
“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”
• XML representation of the same data:
<PURCHASE_ORDER>
<PO_NUM> PO-1234 </PO_NUM>
<CUST_ID> CUST001 </CUST_ID>
<ITEM_NUM> X9876 </ITEM_NUM>
<QUANTITY> 5 </QUANTITY>
<PRICE> 14.98 </PRICE>
</PURCHASE_ORDER>
Overview
Benefits of XML
ISOM
• Open W3C standard
• Representation of data across
heterogeneous environments
Cross platform
Allows for high degree of interoperability
• Strict rules
Syntax
Structure
Case sensitive
Overview
Who Uses XML?
ISOM
• Submissions by
Microsoft
IBM
Hewlett-Packard
Fujitsu Laboratories
Sun Microsystems
Netscape (AOL), and others…
• Technologies using XML
SOAP, ebXML, BizTalk, WebSphere, many
others…
Agenda
ISOM
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Syntax and Structure
Components of an XML Document
ISOM
• Elements
Each element has a beginning and ending tag
• <TAG_NAME>...</TAG_NAME>
Elements can be empty (<TAG_NAME />)
• Attributes
Describes an element; e.g. data type, data range, etc.
Can only appear on beginning tag
• Processing instructions
Encoding specification (Unicode by default)
Namespace declaration
Schema declaration
Syntax and Structure
Components of an XML Document
ISOM
<?xml version=“1.0” ?>
<?xml-stylesheet type="text/xsl” href=“template.xsl"?>
<ROOT>
<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1>
<ELEMENT2> </ELEMENT2>
<ELEMENT3 type=‘string’> </ELEMENT3>
<ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>
</ROOT>
Elements with Attributes
Elements
Prologue (processing instructions)
Syntax and Structure
Rules For Well-Formed XML
ISOM
• There must be one, and only one, root element
• Sub-elements must be properly nested
A tag must end within the tag in which it was started
• Attributes are optional
Defined by an optional schema
• Attribute values must be enclosed in “” or ‘’
• Processing instructions are optional
• XML is case-sensitive
<tag> and <TAG> are not the same type of element
Syntax and Structure
Well-Formed XML?
ISOM
• No, CHILD2 and CHILD3 do not
nest properly
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2><CHILD3>Number 3</CHILD2></CHILD3>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM
• No, there are two root elements
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
</PARENT>
<PARENT>
<CHILD1>This is another element 1</CHILD1>
</PARENT>
Syntax and Structure
Well-Formed XML?
ISOM
• Yes
<xml? Version=“1.0” ?>
<PARENT>
<CHILD1>This is element 1</CHILD1>
<CHILD2/>
<CHILD3></CHILD3>
</PARENT>
Syntax and Structure
An XML Document
ISOM
<?xml version='1.0'?>
<bookstore>
<book genre=‘autobiography’ publicationdate=‘1981’
ISBN=‘1-861003-11-0’>
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’>
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Syntax and Structure
Namespaces: Overview
ISOM
• Part of XML’s extensibility
• Allow authors to differentiate between tags
of the same name (using a prefix)
Frees author to focus on the data and decide
how to best describe it
Allows multiple XML documents from multiple
authors to be merged
• Identified by a URI (Uniform Resource
Identifier)
When a URL is used, it does NOT have to
represent
a live server
Syntax and Structure
Namespaces: Declaration
ISOM
Namespace declaration examples:
xmlns: bk = “http://www.example.com/bookinfo/”
xmlns: bk = “urn:mybookstuff.org:bookinfo”
xmlns: bk = “http://www.example.com/bookinfo/”
Namespace declaration Prefix URI (URL)
Syntax and Structure
Namespaces: Examples
ISOM
<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>
<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”
xmlns:money=“urn:finance:money”>
<bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE money:currency=‘US Dollar’>
19.99</bk:PRICE>
Syntax and Structure
Namespaces: Default Namespace
ISOM
• An XML namespace declared
without a prefix becomes the default
namespace for all
sub-elements
• All elements without a prefix will
belong to the default namespace:
<BOOK xmlns=“http://www.bookstuff.org/bookinfo”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
Syntax and Structure
Namespaces: Scope
ISOM
• Unqualified elements belong to the
inner-most default namespace.
BOOK, TITLE, and AUTHOR belong to
the default book namespace
PUBLISHER and NAME belong to the
<BOOK default publisher namespace
xmlns=“www.bookstuff.org/bookinfo”>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns=“urn:publishers:publinfo”>
<NAME>Microsoft Press</NAME>
</PUBLISHER>
</BOOK>
Syntax and Structure
Namespaces: Attributes
ISOM
• Unqualified attributes do NOT
belong to any namespace
Even if there is a default namespace
• This differs from elements, which
belong to the default namespace
Syntax and Structure
Entities
ISOM
• Entities provide a mechanism for textual
substitution, e.g.
Entity Substitution
< <
& &
• You can define your own entities
• Parsed entities can contain text and markup
• Unparsed entities can contain any data
JPEG photos, GIF files, movies, etc.
Agenda
ISOM
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
The XML ‘Alphabet Soup’
ISOM
• XML itself is fairly simple
• Most of the learning curve is
knowing about
all of the related technologies
The XML ‘Alphabet Soup’
ISOM
XML Extensible Markup Defines XML documents
Language
Infoset Information Set Abstract model of XML data;
definition of terms
DTD Document Type Non-XML schema
Definition
XSD XML Schema XML-based schema language
XDR XML Data Reduced An earlier XML schema
CSS Cascading Style Sheets Allows you to specify styles
XSL Extensible Stylesheet Language for expressing
Language stylesheets; consists of XSLT and
XSL-FO
XSLT XSL Transformations Language for transforming XML
documents
XSL-FO XSL Formatting Language to describe precise layout
Objects of text on a page
The XML ‘Alphabet Soup’
ISOM
XPath XML Path Language A language for addressing parts of
an XML document, designed to be
used by both XSLT and XPointer
XPointer XML Pointer Supports addressing into the
Language internal structures of XML
documents
XLink XML Linking Describes links between XML
Language documents
XQuery XML Query Language Flexible mechanism for querying
(draft) XML data as if it were a database
DOM Document Object API to read, create and edit XML
Model documents; creates in-memory
object model
SAX Simple API for XML API to parse XML documents;
event-driven
Data Island XML data embedded in a HTML page
Data Automatic population of HTML elements from XML data
Binding
The XML ‘Alphabet Soup’
Schemas: Overview
ISOM
• DTD (Document Type Definitions)
Not written in XML
No support for data types or
namespaces
• XSD (XML Schema Definition)
Written in XML
Supports data types
Current standard recommended by
W3C
The XML ‘Alphabet Soup’
Schemas: Purpose
ISOM
• Define the “rules” (grammar) of the document
Data types
Value bounds
• A XML document that conforms to a schema
is said to be valid
More restrictive than well-formed XML
• Define which elements are present and
in what order
• Define the structural relationships of elements
The XML ‘Alphabet Soup’
Schemas: DTD Example
ISOM
• XML document:
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
• DTD schema:
<!DOCTYPE BOOK [
<!ELEMENT BOOK (TITLE+, AUTHOR) >
<!ELEMENT TITLE (#PCDATA) >
<!ELEMENT AUTHOR (#PCDATA) >
]>
The XML ‘Alphabet Soup’
Schemas: XSD Example
ISOM
• XML document:
<CATALOG>
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
…
</CATALOG>
The XML ‘Alphabet Soup’
Schemas: XSD Example
ISOM
<xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd"
xmlns="http://tempuri.org/schema1.xsd"
xmlns:xsd="http://www.w3.org/1999/XMLSchema"
xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xsd:element name="book">
<xsd:complexType content="elementOnly">
<xsd:all>
<xsd:element name="title" minOccurs="0" type="xsd:string"/>
<xsd:element name="author" minOccurs="0" type="xsd:string"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
<xsd:element name=“Catalog" msdata:IsDataSet="True">
<xsd:complexType>
<xsd:choice maxOccurs="unbounded">
<xsd:element ref="book"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:schema>
The XML ‘Alphabet Soup’
Schemas: Why You Should Use XSD
ISOM
• Newest W3C Standard
• Broad support for data types
• Reusable “components”
Simple data types
Complex data types
• Extensible
• Inheritance support
• Namespace support
• Ability to map to relational database tables
• XSD support in Visual Studio.NET
The XML ‘Alphabet Soup’
Transformations: XSL
ISOM
• Language for expressing document
styles
• Specifies the presentation of XML
More powerful than CSS
• Consists of:
XSLT
XPath
XSL Formatting Objects (XSL-FO)
The XML ‘Alphabet Soup’
Transformations: Overview
ISOM
• XSLT – a language used to
transform XML data into a different
form (commonly XML or HTML)
XML
XML,
HTML,
…
XSLT
The XML ‘Alphabet Soup’
Transformations: XSLT
ISOM
• The language used for converting XML
documents into other forms
• Describes how the document is
transformed
• Expressed as an XML document (.xsl)
• Template rules
Patterns match nodes in source document
Templates instantiated to form part of result
document
• Uses XPath for querying, sorting, etc.
The XML ‘Alphabet Soup’
XPath (XML Path Language)
ISOM
• General purpose query language for
identifying nodes in an XML
document
• Declarative (vs. procedural)
• Contextual – the results depend on
current node
• Supports standard comparison,
Boolean and mathematical
operators (=, <, and, or, *, +, etc.)
The XML ‘Alphabet Soup’
XPath Operators
ISOM
Operator Usage Description
/ Child operator – selects only immediate children
(when at the beginning of the pattern, context is root)
// Recursive descent – selects elements at any depth
(when at the beginning of the pattern, context is root)
. Indicates current context
.. Selects the parent of the current node
* Wildcard
@ Prefix to attribute name (when alone, it is an attribute
wildcard)
[ ] Applies filter pattern
The XML ‘Alphabet Soup’
XPath Query Examples
ISOM
./author (finds all author elements within current context)
/bookstore (find the bookstore element at the root)
/* (find the root element)
//author (find all author elements anywhere in document)
/bookstore[@specialty = “textbooks”]
(find all bookstores where the specialty
attribute = “textbooks”)
/book[@style = /bookstore/@specialty]
(find all books where the style attribute = the
specialty attribute of the bookstore element
at the root)
More XPath Examples
ISOM
Path Expression Result
/bookstore/book[1] Selects the first book element that is the child of the
bookstore element
/bookstore/book[last()] Selects the last book element that is the child of the
bookstore element
/bookstore/book[last()-1] Selects the last but one book element that is the child of
the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are children of the
bookstore element
//title[@lang] Selects all the title elements that have an attribute named
lang
//title[@lang='eng'] Selects all the title elements that have an attribute named
lang with a value of 'eng'
/bookstore/book[price>35.00] Selects all the book elements of the bookstore element
that have a price element with a value greater than
35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the
bookstore element that have a price element with a
value greater than 35.00
XPath Functions
ISOM
• Accessor functions:
node-name, data, base-uri, document-uri
• Numeric value functions:
abs, ceiling, floor, round, …
• String functions:
compare, concat, substring, string-length,
uppercase, lowercase, starts-with, ends-
with, matches, replace, …
• Other functions include functions on
boolean values, dates, nodes, etc.
The XML ‘Alphabet Soup’
Data Islands
ISOM
• XML embedded in an HTML document
• Manipulated via client side script or data
binding
<XML id=“XMLID”>
<BOOK>
<TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
</BOOK>
</XML>
<XML id=“XMLID” src=“mydocument.xml”>
The XML ‘Alphabet Soup’
Data Islands
ISOM
• Can be embedded in an HTML
SCRIPT element
• XML is accessible via the DOM:
<SCRIPT language=“xml” id=“XMLID”>
<SCRIPT type=“text/xml” id=“XMLID”>
<SCRIPT language=“xml” id=“XMLID”
src=“mydocument.xml”>
The XML ‘Alphabet Soup’
XML-Based Applications
ISOM
• Microsoft SQL Server
Retrieve relational data as XML
Query XML data
Join XML data with existing database tables
Update the database via XML Updategrams
New XML data type in SQL 2005
• Microsoft Exchange Server
XML is native representation of many types of
data
Used to enhance performance of UI scenarios
(for example, Outlook Web Access (OWA))
Agenda
ISOM
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
XML as a Meta-Language
ISOM
SAX A Language to CSS
DOM
create Languages
DSSL
XSL
XLL XML/DTD
XSLT
XSchema GO
CML
XPath
MathML
WML
XPointer XQL BeanML
Gene Ontology (GO)
ISOM
• Describing and manipulating information about
the molecular function, biological process and
cellular component of gene products.
• Gene Ontology website:
http://www.geneontology.org
• GO DTD:
ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd
• GO Browsers and tools:
http://www.geneontology.org/#tools
• GO Resources and samples:
http://www.geneontology.org/#annotations
Math ML
ISOM
• Describing and manipulating mathematical
notations
• MathML website
www.w3.org/Math
• MathML DTD
www.w3.org/Math/DTD
• MathML Browser
www.w3.org/Amaya
• MathML Resources
www.webeq.com/mathml see sample documents here
Chemical ML
ISOM
• Representing molecular and chemical
information
• CML website
www.xml-cml.org
• CML DTD
www.xml-cml.org/dtdschema/index.html
• CML Browser and Authoring Environment
www.xml-cml.org/jumbo.html
• CML Resources
www.xml-cml.org/chimeral/index.html
see sample documents here
some require plug-in downloads, can be slow
Wireless ML
ISOM
• Allows web pages to be displayed over mobile
devices
• WML works with WAP to deliver the content
• Underlying model: Deck of Cards that the User
can sift through
• WAP/WML website
www.wapforum.org
• WML DTD
www.wapforum.org/DTD/wml_1.1.xml
• WAP/WML Resources
www.oasis-open.org/cover/wap-wml.html
www.w3scripts.com/wap Tutorial on WML, also see
WAP Demo
Scalable Vector Graphics
ISOM
• Describing vector graphics data for use over the
web
• Rendering is done on the browser
• Bandwidth demands lower, scaling easier
• SVG website
www.w3.org/Graphics/SVG
• SVG Plug-Ins
www.adobe.com/svg
• SVG Resources
www.irt.org/articles/js176 1999 article and good, brief
tutorial
planet.svg An Example from Deitel
Bean ML
ISOM
• Describing software components such as Java
Beans
• Defines how the components are interconnected
and can be used
• Bean ML Specs and Tools
www.alphaworks.ibm.com/aw.nsf/techmain/bml
• Bean ML Resources
www.oasis-open.org/cover/beanML.html
With Bean ML
• You can mark-up beans using Bean ML
• And invoke different operations on Beans
• Includes BML Scripting Framework
XBRL
ISOM
• Extensible Business Reporting Language
• Capturing and representing financial and accounting
information
• Variety of situations
e.g. publishing reports, extracting data for analysis,
regulatory forms etc.
• Initiated under the direction of AICPA
• XBRL website
www.xbrl.org
• XBRL DTDs and Schemas
http://www.xbrl.org/Core/2000-07-31/default.htm
• Demos and Tools
http://www.xbrl.org/Demos/demos.htm
http://www.xbrl.org/Tools.htm
News ML
ISOM
• Designed to be media-independent
• Initiated by International Press
Telecommunications Council
• Enables tracking of news stories over time
• NewsML website
www.newsml.org
• NewsML DTD
http://www.oasis-open.org/cover/newsML.html
• SportsML DTD – Derived from NewsML DTD
http://xml.coverpages.org/sportsML.html
cXML
ISOM
• CommerceXML from Ariba plus 40 other
companies
• cXML website
www.cxml.org
• Primary Set of Tools/Implementations to support
cXML
http://www.ariba.com/solutions/solutions_overview.cfm
See also Whitepapers link explaining how these can be
used for
• E-procurement
• E-fulfillment
• And others ..
xCBL
ISOM
• xCBL from Microsoft, SAP, Sun
• xCBL website
www.xcbl.org
Marketed as XML component library for B2B
e-commerce
• Available Resources (see internal links)
DTDs and Schemas
XDK: SOX Parser and an XSLT Engine
Example Documents
ebXML
ISOM
• UN/CEFACT: the United Nations body whose mandate
covers worldwide policy and technical development in the
area of trade facilitation and electronic business.
www.uncefact.org
• ebXML website
www.ebxml.org
• Current Endorsements
http://www.ebxml.org/endorsements.htm
Still needs buy-in from the larger IS/IT vendors
• Related Effort: RosettaNet
http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/L
ayoutInitial
Business Processes for IT, Component and Chip companies
Conclusion
ISOM
• Overview
• Syntax and Structure
• The XML Alphabet Soup
• XML as a meta-language
Resources
ISOM
• http://www.xml.com/
• http://www.w3.org/xml/
• http://www.w3schools.com/
• http://msdn.microsoft.com/xml/