XML Schema
By Rajneesh Goyal
Introduction to Schema
XML Schema is an XML-based alternative to DTD.
An XML schema describes the structure of an XML document.
The XML Schema language is also referred to as XML Schema Definition (XSD).
What XML Schema?
The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. An XML Schema:
defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes
Why XML Schemas?
DTDs provide a very weak specification language You cant put any restrictions on text content You have very little control over mixed content (text plus elements) You have little control over ordering of elements DTDs are written in a strange (non-XML) format You need separate parsers for DTDs and XML The XML Schema Definition language solves these problems XSD gives you much more control over structure and content XSD is written in XML
Why XML Schemas?
DTDs provide a very weak specification language You cant put any restrictions on text content No built-in data type in DTDs Limited use of cardinality Namespaces are not supported You have very little control over mixed content You have little control over ordering of elements DTDs are written in a strange (non-XML) format You need separate parsers for DTDs and XML The XML Schema Definition language solves these problems XSD gives you much more control over structure and content XSD is written in XML Support large built-in data types Support Namespace and extensible
Referring to a schema
To refer to a DTD in an XML document, the reference goes before the root element: <?xml version="1.0"?> <!DOCTYPE rootElement SYSTEM "url"> <rootElement> ... </rootElement> To refer to an XML Schema in an XML document, the reference goes in the root element: <?xml version="1.0"?> <rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" (The XML Schema Instance reference is required) xsi:noNamespaceSchemaLocation="url.xsd"> (This is where your XML Schema definition can be found) ... </rootElement>
The XSD document
Since the XSD is written in XML, it can get confusing which we are talking about
The file extension is .xsd The root element is <schema> The XSD starts like this:
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.rg/2001/XMLSchema">
<schema>
The <schema> element may have attributes: xmlns:xs="http://www.w3.org/2001/XMLSchema" This is necessary to specify where all our XSD tags are defined
elementFormDefault="qualified" This means that all XML elements must be qualified
Schema Data Types
Simple type Complex type
Simple and complex elements
A simple element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may have various restrictions applied to it If an element isnt simple, its complex A complex element may have attributes A complex element may be empty, or it may contain text, other elements, or both text and other elements
Defining a simple element
A simple element is defined as <xs:element name="name" type="type" />
Example: <xs:element name=Name" type="xs:string" /> <xs:element name=age" type="xs:integer" /> where: name is the name of the element the most common values for type are xs:boolean xs:integer xs:date xs:string xs:decimal xs:time Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified
Complex elements
A complex element is defined as <xs:element name="name"> <xs:complexType> ... information about the complex type... </xs:complexType> </xs:element> Example: <xs:element name="person"> <xs:complexType> <xs:sequence> <xs:element name="firstName" type="xs:string" /> <xs:element name="lastName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:sequence> says that elements must occur in this order Remember that attributes are always simple types
Indicators
Order indicators: All Choice Sequence Occurrence indicators: maxOccurs minOccurs Group indicators: Group name attributeGroup name
A schema Example
<?xml version="1.0" encoding="ISO-8859-1"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"> <xs:element name="persons"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="full_name" type="xs:string"/> <xs:element name="child_name" type="xs:string" minOccurs="0" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
Group Elements
<xs:group name="persongroup"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> <xs:element name="birthday" type="xs:date"/> </xs:sequence> </xs:group> <xs:element name="person" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:group ref="persongroup"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType>
Defining an attribute
Attributes themselves are always declared as simple types An attribute is defined as <xs:attribute name="name" type="type" /> where: name and type are the same as for xs:element Other attributes of a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="optional" the attribute is not required (default) use="required" the attribute must be present
Attribute Groups
<xs:attributeGroup name="personattrgroup"> <xs:attribute name="firstname" type="xs:string"/> <xs:attribute name="lastname" type="xs:string"/> <xs:attribute name="birthday" type="xs:date"/> </xs:attributeGroup>
<xs:element name="person"> <xs:complexType> <xs:attributeGroup ref="personattrgroup"/> </xs:complexType> </xs:element>
Restrictions
The general form for putting a restriction on a text value is: <xs:element name="name"> (or xs:attribute) <xs:restriction base="type"> ... the restrictions ... </xs:restriction> </xs:element> For example: <xs:element name="age"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"> <xs:maxInclusive value="140"> </xs:restriction> </xs:element>
C-DAC,Hyderabad
Restrictions on numbers
minInclusive -- number must be the given value
minExclusive -- number must be > the given value
maxInclusive -- number must be the given value maxExclusive -- number must be < the given value
totalDigits -- number must have exactly value digits
fractionDigits -- number must have no more than value digits after the decimal point
C-DAC,Hyderabad
Restrictions on strings
length -- the string must contain exactly value characters minLength -- the string must contain at least value characters maxLength -- the string must contain no more than value characters pattern -- the value is a regular expression that the string must
match
Predefined string types
C-DAC,Hyderabad
Recall that a simple element is defined as: <xs:element name="name" type="type" /> Here are a few of the possible string types: xs:string -- a string xs:normalizedString -- a string that doesnt contain tabs, newlines, or carriage returns xs:token -- a string that doesnt contain any whitespace other than single spaces Allowable restrictions on strings: enumeration, length, maxLength, minLength, pattern, whiteSpace
C-DAC,Hyderabad
Predefined date and time types
xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05 xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss Allowable restrictions on dates and times: enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, pattern, whiteSpace
Predefined numeric types
xs:decimal xs:byte xs:short xs:int xs:long
C-DAC,Hyderabad
Here are some of the predefined numeric types:
xs:positiveInteger xs:negativeInteger xs:nonPositiveInteger xs:nonNegativeInteger
Allowable restrictions on numeric types:
enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace
End of Schema