Read XML:
Read XML loads XML data into a graph.
Read XML reads a stream of characters, bytes, or records; translates the XML
data into records that can be described by a DML record format; and sends
the records to its out port.
Properties of Read XML:
o Parser – The XML parser to be used for reading the XML document
There are two kinds of Parsers:
Expat - This is a C language, stream oriented parser and it reads the
XML document only in the form of DOM (Document Object Model)
object
Xerces – This is C++ parser which validates,parses,generates and
manipulates XML documents using DOM and also other APIs
o Custom-format - Use custom record format instead of one generated by xml-
to-dml utility
If custom-format is true – then the following fields will be enabled
Root-element - Name of root element for custom format reading
Base-element -Name of base element for custom format
reading
Leaf-element - Name of leaf element for custom format reading
Character-data-field - Name of character data field for custom
format reading
If custom-format is false - then the following fields will be enabled
Ignore-blank-cdata - Supress records corresponding to all-
whitespace character data
Ignore-unknown-attributes - Ignore unknown attributes in the
input XML
Ignore-unknown-elements - Ignore unknown elements in the
input XML
o Missing-fields – Ignore/Error if an element or attribute is missing in the input
document
o Document-per-record - Treat each record of input as a complete document
o Eme-dataset – Dataset value for EME
Xml-to-dml utility:
Significance:
This abinitio utility is used to generate the dml from the xml document.
Syntax:
xml-to-dml –base-element <Name of the base element> <XML file with
location> > <DML file location>
Example:
xml-to-dml -base-element employee
/prod/users/edw/xbh927/training/read_xml_input_file.xml >
/prod/users/edw/xbh927/training/read_xml_dml.dml
Generated DML :
////////////////////////////////////////////////////////////////////////
// This file was automatically generated by xml-to-dml
// with the command line arguments:
// -parse exemplar -model full-object -base-element Employee
/prod/users/edw/xbh927/training/read_xml_input_file.xml
////////////////////////////////////////////////////////////////////////
type Employee_Name_t = record
string('\0') First_Name;
string('\0') Last_Name;
end;
type Employee_t = record
string('\0') id;
string('\0') XML_attribute_fields() = 'id,';
Employee_Name_t Employee_Name;
string('\0') XML_base_element() = 'Employee';
end;
metadata type = Employee_t;
Sample Graph:
Input file:
$ cat read_xml_input_file.xml
<Employee_list>
<Employee id="131420">
<Employee_Name>
<First_Name>Sindhuja</First_Name>
<Last_Name>Prabakaran</Last_Name>
</Employee_Name>
</Employee>
</Employee_list>
Output file:
Record 1:
[record
id "131420"
Employee_Name [record
First_Name "Sindhuja"
Last_Name "Prabakaran"]]
The Read XML read the XML document and converts the data into the record format
given.
Using Custom-format for Reading the XML document:
The custom-format is used to generate read the XML document without using the
xml-to-dml utility.
But the XML attributes are not processed in using custom-format approach.
As explained before, if the custom-format is enabled, then we need to specify the
root, base and leaf element.
The output file looks like this.
Since the custom-format will not read the attribute values, the First name and Last
name is NULL here.