0% found this document useful (0 votes)
151 views7 pages

Chapter 6 XML

XML stands for eXtensible Markup Language. It is used to provide additional information about a document by adding tags. XML tags describe the meaning of data rather than how to display it like HTML tags. XML documents must follow specific rules to be considered well-formed, such as being properly nested and case sensitive. XML is commonly used to transfer data between systems and applications.

Uploaded by

Habtamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views7 pages

Chapter 6 XML

XML stands for eXtensible Markup Language. It is used to provide additional information about a document by adding tags. XML tags describe the meaning of data rather than how to display it like HTML tags. XML documents must follow specific rules to be considered well-formed, such as being properly nested and case sensitive. XML is commonly used to transfer data between systems and applications.

Uploaded by

Habtamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

28-Jan-20

Wollo University
Kombolcha Institute of Technology What is XML
College of Informatics
Department of Information System  XML stands for eXtensible Markup Language.
 A markup language is used to provide information
Introduction to Internet Programming I
about a document.
Chapter 6  Tags are added to the document to provide the extra
Introduction to Extensible Markup information.
 HTML tags tell a browser how to display the
Language (XML) document.
 XML tags give a reader some idea what some of the
Instructor: Habtamu Abate ([Link].) data means.
Email: habate999@[Link]
1

What is XML Used For? Advantages of XML

 XML documents are used to transfer data from one place to  XML is text (Unicode) based.
another often over the Internet.
 Takes up less space.
 XML subsets are designed for particular applications.
 Can be transmitted efficiently.
 One is RSS (Rich Site Summary or Really Simple Syndication ).
It is used to send breaking news bulletins from one web site to  One XML document can be displayed differently in
another. different media.
 A number of fields have their own subsets. These include  Html, video, CD, DVD,
chemistry, mathematics, and books publishing.
 You only have to change the XML document in order to
 Most of these subsets are registered with the W3Consortium
and are available for anyone’s use. change all the rest.
 XML documents can be modularized. Parts can be
reused.

1
28-Jan-20

Example of an HTML Document Example of an XML Document

<html> <?xml version=“1.0”/>


<head><title>Example</title></head. <address>
<body> <name>Alice Lee</name>
<h1>This is an example of a page.</h1> <email>alee@[Link]</email>
<h2>Some information goes here.</h2> <phone>212-346-1234</phone>
</body> <birthday>1985-03-22</birthday>
</html> </address>

Difference Between HTML and XML XML Rules

 HTML tags have a fixed meaning and browsers  Tags are enclosed in angle brackets.
know what it is.  Tags come in pairs with start-tags and end-tags.
 XML tags are different for different applications,  Tags must be properly nested.
and users know what they mean.  <name><email>…</name></email> is not allowed.
 <name><email>…</email><name> is.
 HTML tags are used for display.  Tags that do not have end-tags must be terminated
 XML tags are used to describe documents and data. by a ‘/’.
 <br /> is an html example.

2
28-Jan-20

More XML Rules Encoding


 XML (like Java) uses Unicode to encode characters.
 Tags are case sensitive.  Unicode comes in many flavors. The most common one used in
 <address> is not the same as <Address> the West is UTF-8.
 XML in any combination of cases is not allowed as  UTF-8 is a variable length code. Characters are encoded in 1
byte, 2 bytes, or 4 bytes.
part of a tag.
 The first 128 characters in Unicode are ASCII.
 Tags may not contain ‘<‘ or ‘&’.  In UTF-8, the numbers between 128 and 255 code for some of
 Tags follow Java naming conventions, except that a the more common characters used in western Europe, such as ã,
single colon and other characters are allowed. They á, å, or ç.
must begin with a letter and may not contain white  Two byte codes are used for some characters not listed in the
space. first 256 and some Asian ideographs.
 Documents must have a single root tag that begins the  Four byte codes can handle any ideographs that are left.
document.  Those using non-western languages should investigate other
versions of Unicode.

Well-Formed Documents XML Example Revisited


<?xml version=“1.0”/>
 An XML document is said to be well-formed if it <address>
<name>Alice Lee</name>
follows all the rules.
<email>alee@[Link]</email>
 An XML parser is used to check that all the rules have <phone>212-346-1234</phone>
been obeyed. <birthday>1985-03-22</birthday>
 Recent browsers such as Internet Explorer 5 and </address>
Netscape 7 come with XML parsers.  Markup for the data aids understanding of its purpose.
 Parsers are also available for free download over  A flat text file is not nearly so clear.
the Internet. One is Xerces, from the Apache open- Alice Lee
source project. alee@[Link]
 Java 1.4 also supports an open-source parser. 212-346-1234
1985-03-22
 The last line looks like a date, but what is it for?

3
28-Jan-20

Expanded Example XML Files are Trees

<?xml version = “1.0” ?>


<address>
<name> address
<first>Alice</first>
<last>Lee</last> name email phone birthday
</name>
<email>alee@[Link]</email> first last year month day
<phone>123-45-6789</phone>
<birthday>
<year>1983</year>
<month>07</month>
<day>15</day>
</birthday>
</address>

XML Trees Validity

 An XML document has a single root node.  A well-formed document has a tree structure and
obeys all the XML rules.
 The tree is a general ordered tree.  A particular application may add more rules in either
A parent node may have any number of children. a DTD (document type definition) or in a schema.
 Child nodes are ordered, and may have siblings.  Many specialized DTDs and schemas have been
created to describe particular areas.
 Preorder traversals are usually used for getting
 These range from disseminating news bulletins (RSS)
information out of the tree. to chemical formulas.
 DTDs were developed first, so they are not as
comprehensive as schema.

4
28-Jan-20

Document Type Definitions DTD for address Example

 A DTD describes the tree structure of a document <!ELEMENT address (name, email, phone, birthday)>
and something about its data. <!ELEMENT name (first, last)>
 There are two data types, PCDATA and CDATA. <!ELEMENT first (#PCDATA)>
 PCDATA is parsed character data. <!ELEMENT last (#PCDATA)>
 CDATA is character data, not usually parsed.
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
 A DTD determines how many times a node may
<!ELEMENT birthday (year, month, day)>
appear, and how child nodes are ordered.
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>

Schemas Schema for First address Example


<?xml version="1.0" encoding="ISO-8859-1" ?>
 Schemas are themselves XML documents. <xs:schema xmlns:xs="[Link]
<xs:element name="address">
 They were standardized after DTDs and provide <xs:complexType>
more information about the document. <xs:sequence>
<xs:element name="name" type="xs:string"/>
 They have a number of data types including string, <xs:element name="email" type="xs:string"/>
decimal, integer, boolean, date, and time. <xs:element name="phone" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
 They divide elements into simple and complex types. </xs:sequence>
 They also determine the tree structure and how many </xs:complexType>
</xs:element>
children a node may have.
</xs:schema>

5
28-Jan-20

XSLT
Explanation of Example Schema
Extensible Stylesheet Language Transformations
<?xml version="1.0" encoding="ISO-8859-1" ?>
 ISO-8859-1, Latin-1, is the same as UTF-8 in the first 128 characters.

<xs:schema xmlns:xs="[Link]
 XSLT is used to transform one xml document into
 [Link]/2001/XMLSchema contains the schema standards. another, often an html document.
<xs:element name="address">
 The Transform classes are now part of Java 1.4.
<xs:complexType>
 This states that address is a complex type element.  A program is used that takes as input one xml
<xs:sequence> document and produces as output another.
 This states that the following elements form a sequence and must come in the
order shown.  If the resulting document is in html, it can be viewed
<xs:element name="name" type="xs:string"/> by a web browser.
 This says that the element, name, must be a string.

<xs:element name="birthday" type="xs:date"/>  This is a good way to display xml data.


 This states that the element, birthday, is a date. Dates are always of the form
yyyy-mm-dd.

A Style Sheet to Transform [Link] Parsers

<?xml version="1.0" encoding="ISO-8859-1"?>


<xsl:stylesheet version="1.0"
 There are two principal models for parsers.
xmlns:xsl="[Link]  SAX – Simple API for XML
<xsl:template match="address">
<html><head><title>Address Book</title></head>  Uses a call-back method
<body>  Similar to javax listeners
<xsl:value-of select="name"/>
<br/><xsl:value-of select="email"/>  DOM – Document Object Model
<br/><xsl:value-of select="phone"/>  Creates a parse tree
<br/><xsl:value-of select="birthday"/>
</body>  Requires a tree traversal
</html>
</xsl:template>
</xsl:stylesheet>

6
28-Jan-20

25 Question???
Practice:
Lab Exercise # 1,2 and3

Internet Programming I CSS

You might also like