Looking Inside the Developer’s Toolkit:
Introduction to Processing XML with RPG
and SQL Too!
Charles Guarino
Central Park Data Systems, Inc.
@charlieguarino
About The Speaker
With an IT career spanning over 30 years, Charles Guarino has
been a consultant for most of them. Since 1995 he has been
founder and President of Central Park Data Systems, Inc., a New
York area based IBM midrange consulting and corporate training
company. In addition to being a professional speaker across the
United States and Europe, he is a frequent contributor of technical
and strategic articles and webcasts for the IT community. He is a
member of COMMON’s Speaker Excellence Hall of Fame and also
Long Island Software and Technology Network’s Twenty Top
Techies. In 2015 Charles became the recipient of the Al Barsa
Memorial Scholarship Award. Additionally, he serves as a member
of COMMON’s Strategic Education Team (SET) and is also a past
president and monthly Q&A host of LISUG, a Long Island IBM i
User’s Group www.lisug.org.
Charles can be reached at
[email protected].
LinkedIn - http://www.linkedin.com/in/guarinocharles
Twitter - @charlieguarino
Copyright Central Park Data Systems, Inc. 1
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
ESSENTIAL DEFINITIONS
1) What is XML and why use it?
2) Parsers – Two types!
3) Well formed document
4) Where is XML typically stored on the IBM i?
Copyright Central Park Data Systems, Inc. 2
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
If you are already familiar with HTML…
Copyright Central Park Data Systems, Inc. 3
Then XML will be a snap!
Document “citydata1.xml”
Copyright Central Park Data Systems, Inc. 4
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
Which RPG parser to use?
Examine the XML document.
Does it have a consistent structure, with repetitive
groups or many different structures, and/or complex
structures?
Will you know in advance how the XML is actually
formatted?
How will you be using the data?
Is it a particularly large document?
These important answers will direct you to which
parser to use.
Copyright Central Park Data Systems, Inc. 5
DOM = Document Object Model
Allows an application to read and update
XML data directly in memory. In the
DOM implementation, data is moved into
arrays and data structures.
Programmers need to be sensitive to how
large an XML document is so as to not
exceed the available system storage.
Optionally, a ‘handler’ can be used to
deal with array overflow.
RPG implementation “XML-INTO”
SAX = Simple API for XML
The SAX parser walks (runs?) through
an entire XML document, one element
at a time, and returns control to the
handler as each new ‘event’ is
encountered. The complexity of the
document is not relevant.
Also, since system memory is not
consumed, the document size is not
relevant.
RPG implementation “XML-SAX”
Copyright Central Park Data Systems, Inc. 6
Where can you find sample XML? There are lots of
documents to be downloaded. Take this fairly simple one.
http://www.w3schools.com/xml/simple.xml
This complex XML document was computer-generated, 75 pages long.
This document contains over 150 different complex structures.
SAX is clearly the way to go!
Copyright Central Park Data Systems, Inc. 7
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
Introducing…
XML-INTO !!!
• You need to know the structure format in advance
• Data is mapped into data structures
• Can parse an entire document at one time
• Can use an optional handler for every large documents
• Recommended for less complex structured documents
Copyright Central Park Data Systems, Inc. 8
When using XML-INTO, an XML editor can help you define
your data structures
It’s also helpful to use the “tag”, “data”, “tag” approach
Copyright Central Park Data Systems, Inc. 9
Our first XML parsing program “CITYDATA1G”
“Show in Table” view of parsed data as it resides in physical file CITYDATA1
Copyright Central Park Data Systems, Inc. 10
Nested data structure support is more efficient – time to modernize!!!
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/rzasd/freesubfi
eld.htm#freesubfield__nested
Nested data structures in action
Copyright Central Park Data Systems, Inc. 11
XML parsing program CITYDATA2G using a handler
Variations on a theme – not originally parse-able with RPG
A more typical XML example, where the element “state”
has associated attributes placed before the actual text data.
The text data found here is the actual state name.
Copyright Central Park Data Systems, Inc. 12
Google "xml-into countprefix rpg cafe"
datasubf
countprefix
Data sub fields
Copyright Central Park Data Systems, Inc. 13
Using option “countprefix” is more granular than
“allowmissing=yes”
Sample XML with namespaces – how do you deal with colons?
Source: http://www.w3schools.com/xml/xml_namespaces.asp
Copyright Central Park Data Systems, Inc. 14
RPG can still parse this with namespace support
ns = remove
ns = merge
Summary of XML-INTO %xml Options
Original V5R4 options
doc – XML file exists in a document (versus a datastring)
allowextra – don’t stop with error if undeclared tags are found
allowmissing – don’t stop with error if expected data is missing
path – specifies starting element where to start parsing XML document
case - (any, lower, upper) – specifies the case of the data elements to map
ccsid – (best, job, ucs2) – specifies which CCSID to use
trim – specified whether to include whitespace mapped into your variables
V6R1 PTFs, 7.1 and beyond
Datasubf – names the DS subfield to capture the text data
Countprefix – specifies the prefix for selected array fields where you need
to capture the number of elements return. Can replace allowmissiing=yes.
case (convert using *LANGIDSHR table, convert alphabetic characters)
Nested data structure support
ns (remove=only use matching subfield, merge=join namespace with subfield)
nsprefix declares prefix of new field to capture actual namespace
Copyright Central Park Data Systems, Inc. 15
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
Introducing…
XML-SAX !!!
• You DO NOT need to know the structure format in advance.
• Data is mapped anywhere you want at runtime.
• Reads the document the same way we do,
left to right from top to bottom.
• Each new piece of encountered data is an event,
which is passed to an event handler.
• Recommended for more complex structured documents.
Copyright Central Park Data Systems, Inc. 16
This document contains both XML elements and attributes,
and the data is inconsistent. It is however well-formed.
The sax parser reads the same way you read a book;
one word at a time, left to right, top to bottom.
20 = Start_Document
25 = Version_InfoVersion 1.0
10 = Encoding_Decl UTF-8
9 = Doctype_Decl XMLFileIn
6 = Comment XML Order Export File V1.0
21= Start_Element XMLFileIn
5 = Chars *blank
21 = Start_Element OrderV2
5 = Chars *blank
21= Start_Element Attribute
2 = Attr_Name Name
4 = Attr=Chars Dealer
26 = End_Attr Name
5 = Chars 10912
13 = End_Element Attribute
Copyright Central Park Data Systems, Inc. 17
Tables SAXCTL and SAXDATA
SAXDATA is populated in program SAXPARSES and
processed sequentially in the P.O. creation program.
SAXCTL
PROCESSED_PATH CHAR(20)
PROCESSED_DOC CHAR(20)
PROCESSED_FLAG CHAR(1)
PROCESSED_DATTIM TIMESTAMP
SAXDATA
XMLDATATYPE CHAR(20)
XMLDATA CHAR(256)
XMLDOCPATH CHAR(20)
XMLDOCNAME CHAR(20)
We already know which pieces of data we want to capture.
Each time the event handler is called, if the event code identifies one
of these known values, we capture it. Otherwise we simply ignore it.
Program Listing SAXPARSES
Copyright Central Park Data Systems, Inc. 18
*XMLSAX debugging option
• The *XMLSAX debug option generates a special
array named _QRNU_XMLSAX into your program.
Since the SAX parser returns just a numeric value,
it is helpful to identify what each value represents.
• In the event handler we can test for each event
and take the appropriate action. We will capture
certain pieces of data, based on the event code
that is passed.
All possible sax event codes in data structure _QRNU_XMLSAX
Copyright Central Park Data Systems, Inc. 19
This is how the parsed data looks in file SAXDATA.
A second program will read these records sequentially and process the data.
Event Codes
21
2
4
5
13
Code snippet of how to process this captured sequential data
If datatype = ‘Attr_Chars’;
Select;
When data = ‘Dealer’; Sets flag
flag = ‘Dealer’;
…
If datatype = ‘XML_Chars’;
Select;
When flag = ‘Dealer’;
dealerno = %trim(data);
Map suitable data
When flag = ‘Homeowner’;
homeownerdata = %trim(data);
…
Copyright Central Park Data Systems, Inc. 20
Parsed data shown in SAXDATA using Table View
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
Copyright Central Park Data Systems, Inc. 21
Parsing XML
using SQL
New SQL functionality makes parsing XML a snap
Select a.* from XMLTABLE('Cities/CityData/MonthlyData'
passing( xmlparse(Document
Get_xml_file('/xmldocs/citydata2.xml')))
Columns CityName VarChar(30) Path '../CityName',
Region Varchar(30) Path '../Region',
Month varchar(20) path 'Month',
Low int path 'Low',
High int path 'High'
) as a;
Copyright Central Park Data Systems, Inc. 22
Running SQL Script in Navigator **
**To run Connection > Use Temporary JDBC settings
Set Isolation Level to Cursor Stability (*CS)
Add a SQL INSERT and you’ve got a working program!
Copyright Central Park Data Systems, Inc. 23
Generating
XML
using SQL
Table CUSTMAST contains 5 rows
Copyright Central Park Data Systems, Inc. 24
Program BUILDXML generates XML in the IFS
/xmldocs/custmast.xml
Copyright Central Park Data Systems, Inc. 25
IFS file parsed and converted back to DB2 rows
What We’ll Cover
• Essential Definitions
• XML – A First Look
• A Review of Parsers
• XML-INTO: A Closer Look
• XML-SAX: A Closer Look
• XML and SQL
• Wrap-up
Copyright Central Park Data Systems, Inc. 26
For Further Reading…
Be sure to visit IBM’s RPG Café wiki.
LOTS of good examples of using the native RPG parsers
along with information about other recent RPG
enhancements.
https://www.ibm.com/developerworks/ibmi/rpg/welcome
Wrap-up
• XML-INTO is appropriate for less complex complicated structures.
XML parsing was first introduced into RPG at V5R4.
It parses the XML data into data structures, therefore you MUST
know the names of the tags in advance.
If you are missing tags – or – define too many tags in your data
structure and do not specify “allow-missing” or “allow-extra”
you will receive an RPG parsing error at run-time.
PTF SI34938, which became available in early 2009 enhanced
XML-INTO, allowing it the ability to parse more complex XML
structures.
Additional PTFs are now available to parse XML documents that
contain namespaces – read the PDF!
Copyright Central Park Data Systems, Inc. 27
Wrap-up (continued)
• XML-SAX is appropriate for any type of XML structure.
It is an event driven parser, which reads your document the
same way you read a regular document – left to right, top to
bottom.
An event handler is called for each event that is encountered.
Along with each event is an event code.
“Allow-missing” and “allow-extra” are meaningless to the SAX
parser since you do not pre-define your tags.
You do not need to handle or capture the event code every time
the handler is called. Only capture what you need.
CITYDATA1 Table
CREATE TABLE XMLLIB/CITYDATA1
(CITYNAME CHAR (20 ) NOT NULL WITH DEFAULT,
REGION CHAR (20 ) NOT NULL WITH DEFAULT,
MONTHNAME CHAR (9) NOT NULL WITH DEFAULT,
LOW CHAR (3 ) NOT NULL WITH DEFAULT,
HIGH CHAR (3 ) NOT NULL WITH DEFAULT)
Copyright Central Park Data Systems, Inc. 28
CITYDATA2 Table
CREATE TABLE XMLLIB/CITYDATA2
(CITYNAME CHAR (20 ) NOT NULL WITH DEFAULT,
REGION CHAR (20 ) NOT NULL WITH DEFAULT,
MONTHNAME CHAR (9) NOT NULL WITH DEFAULT,
LOW CHAR (3 ) NOT NULL WITH DEFAULT,
HIGH CHAR (3 ) NOT NULL WITH DEFAULT)
SAXCTL Table
CREATE TABLE XMLLIB/SAXCTL
(PROCESSED_PATH FOR COLUMN PRCDOCPATH CHAR
(20 ) NOT NULL WITH DEFAULT,
PROCESSED_DOC FOR COLUMN PRCDOCNAME CHAR
(20 ) NOT NULL WITH DEFAULT,
PROCESSED_FLAG FOR COLUMN PRCFLAG
CHAR (1 ) NOT NULL WITH DEFAULT,
PROCESSED_DATTIM FOR COLUMN PRCDATTIM
TIMESTAMP NOT NULL WITH DEFAULT)
Copyright Central Park Data Systems, Inc. 29
SAXDATA Table
CREATE TABLE XMLLIB/SAXDATA
(XMLDATATYPE CHAR ( 20) NOT NULL WITH DEFAULT,
XMLDATA CHAR (256 ) NOT NULL WITH DEFAULT,
XMLDOCPATH FOR COLUMN DOCPATH CHAR (20 ) NOT
NULL WITH DEFAULT,
XMLDOCNAME FOR COLUMN DOCNAME CHAR (20 ) NOT
NULL WITH DEFAULT)
CUSTMAST Table
Copyright Central Park Data Systems, Inc. 30
Looking Inside the Developer’s Toolkit:
Introduction to Processing XML with RPG
and SQL Too!
Charles Guarino
THANK YOU !!!
Copyright Central Park Data Systems, Inc. 31