|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.xml.parsers.DocumentBuilder
nu.validator.htmlparser.dom.HtmlDocumentBuilder
public class HtmlDocumentBuilder
This class implements an HTML5 parser that exposes data through the DOM interface.
By default, when using the constructor without arguments, the
this parser coerces XML 1.0-incompatible infosets into XML 1.0-compatible
infosets. This corresponds to ALTER_INFOSET as the general
XML violation policy. To make the parser support non-conforming HTML fully
per the HTML 5 spec while on the other hand potentially violating the SAX2
API contract, set the general XML violation policy to ALLOW.
This does not work with a standard DOM implementation.
It is possible to treat XML 1.0 infoset violations as fatal by setting
the general XML violation policy to FATAL.
The doctype is not represented in the tree.
The document mode is represented as user data DocumentMode
object with the key nu.validator.document-mode on the document
node.
The form pointer is also stored as user data with the key
nu.validator.form-pointer.
| Constructor Summary | |
|---|---|
HtmlDocumentBuilder()
Instantiates the document builder with the JAXP DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation)
Instantiates the document builder with a specific DOM implementation and the infoset-altering XML violation policy. |
|
HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
Instantiates the document builder with a specific DOM implementation and XML violation policy. |
|
HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
Instantiates the document builder with the JAXP DOM implementation and a specific XML violation policy. |
|
| Method Summary | |
|---|---|
void |
addCharacterHandler(CharacterHandler characterHandler)
|
XmlViolationPolicy |
getBogusXmlnsPolicy()
Deprecated. |
XmlViolationPolicy |
getCommentPolicy()
Returns the commentPolicy. |
XmlViolationPolicy |
getContentNonXmlCharPolicy()
Returns the contentNonXmlCharPolicy. |
XmlViolationPolicy |
getContentSpacePolicy()
Returns the contentSpacePolicy. |
DoctypeExpectation |
getDoctypeExpectation()
Returns the doctype expectation. |
org.xml.sax.Locator |
getDocumentLocator()
Returns the Locator during parse. |
DocumentModeHandler |
getDocumentModeHandler()
Returns the document mode handler. |
org.w3c.dom.DOMImplementation |
getDOMImplementation()
Returns the DOM implementation |
Heuristics |
getHeuristics()
|
XmlViolationPolicy |
getNamePolicy()
The policy for non-NCName element and attribute names. |
XmlViolationPolicy |
getStreamabilityViolationPolicy()
Returns the streamabilityViolationPolicy. |
XmlViolationPolicy |
getXmlnsPolicy()
Returns the xmlnsPolicy. |
boolean |
isCheckingNormalization()
Indicates whether NFC normalization of source is being checked. |
boolean |
isHtml4ModeCompatibleWithXhtml1Schemata()
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
boolean |
isMappingLangToXmlLang()
Whether lang is mapped to xml:lang. |
boolean |
isNamespaceAware()
Returns true. |
boolean |
isReportingDoctype()
Returns the reportingDoctype. |
boolean |
isScriptingEnabled()
Whether the parser considers scripting to be enabled for noscript treatment. |
boolean |
isValidating()
Returns false |
org.w3c.dom.Document |
newDocument()
For API compatibility. |
org.w3c.dom.Document |
parse(org.xml.sax.InputSource is)
Parses a document from a SAX InputSource. |
org.w3c.dom.DocumentFragment |
parseFragment(org.xml.sax.InputSource is,
java.lang.String context)
Parses a document fragment from a SAX InputSource. |
void |
setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
Deprecated. |
void |
setCheckingNormalization(boolean enable)
Toggles the checking of the NFC normalization of source. |
void |
setCommentPolicy(XmlViolationPolicy commentPolicy)
Sets the policy for consecutive hyphens in comments. |
void |
setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
Sets the policy for non-XML characters except white space. |
void |
setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
Sets the policy for non-XML white space. |
void |
setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
Sets the doctype expectation. |
void |
setDocumentModeHandler(DocumentModeHandler documentModeHandler)
Sets the document mode handler. |
void |
setEntityResolver(org.xml.sax.EntityResolver resolver)
Sets the entity resolver for URI-only inputs. |
void |
setErrorHandler(org.xml.sax.ErrorHandler errorHandler)
Sets the error handler. |
void |
setHeuristics(Heuristics heuristics)
Sets the encoding sniffing heuristics. |
void |
setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
Whether the HTML 4 mode reports boolean attributes in a way that repeats the name in the value. |
void |
setIgnoringComments(boolean ignoreComments)
Sets whether comment nodes appear in the tree. |
void |
setMappingLangToXmlLang(boolean mappingLangToXmlLang)
Whether lang is mapped to xml:lang. |
void |
setNamePolicy(XmlViolationPolicy namePolicy)
The policy for non-NCName element and attribute names. |
void |
setReportingDoctype(boolean reportingDoctype)
|
void |
setScriptingEnabled(boolean scriptingEnabled)
Sets whether the parser considers scripting to be enabled for noscript treatment. |
void |
setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
Sets the streamabilityViolationPolicy. |
void |
setTransitionHander(TransitionHandler handler)
|
void |
setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
Whether the xmlns attribute on the root element is
passed to through. |
void |
setXmlPolicy(XmlViolationPolicy xmlPolicy)
This is a catch-all convenience method for setting name, xmlns, content space, content non-XML char and comment policies in one go. |
| Methods inherited from class javax.xml.parsers.DocumentBuilder |
|---|
getSchema, isXIncludeAware, parse, parse, parse, parse, reset |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation,
XmlViolationPolicy xmlPolicy)
implementation - the DOM implementationxmlPolicy - the policypublic HtmlDocumentBuilder(org.w3c.dom.DOMImplementation implementation)
implementation - the DOM implementationpublic HtmlDocumentBuilder()
public HtmlDocumentBuilder(XmlViolationPolicy xmlPolicy)
xmlPolicy - the policy| Method Detail |
|---|
public org.w3c.dom.DOMImplementation getDOMImplementation()
getDOMImplementation in class javax.xml.parsers.DocumentBuilderDocumentBuilder.getDOMImplementation()public boolean isNamespaceAware()
true.
isNamespaceAware in class javax.xml.parsers.DocumentBuildertrueDocumentBuilder.isNamespaceAware()public boolean isValidating()
false
isValidating in class javax.xml.parsers.DocumentBuilderfalseDocumentBuilder.isValidating()public org.w3c.dom.Document newDocument()
newDocument in class javax.xml.parsers.DocumentBuilderDocumentBuilder.newDocument()
public org.w3c.dom.Document parse(org.xml.sax.InputSource is)
throws org.xml.sax.SAXException,
java.io.IOException
InputSource.
parse in class javax.xml.parsers.DocumentBuilderis - the source
org.xml.sax.SAXException - if stuff goes wrong
java.io.IOException - if IO goes wrongDocumentBuilder.parse(org.xml.sax.InputSource)
public org.w3c.dom.DocumentFragment parseFragment(org.xml.sax.InputSource is,
java.lang.String context)
throws java.io.IOException,
org.xml.sax.SAXException
InputSource.
is - the sourcecontext - the context element name
org.xml.sax.SAXException - if stuff goes wrong
java.io.IOException - if IO goes wrongpublic void setEntityResolver(org.xml.sax.EntityResolver resolver)
setEntityResolver in class javax.xml.parsers.DocumentBuilderresolver - the resolverDocumentBuilder.setEntityResolver(org.xml.sax.EntityResolver)public void setErrorHandler(org.xml.sax.ErrorHandler errorHandler)
setErrorHandler in class javax.xml.parsers.DocumentBuildererrorHandler - the handlerDocumentBuilder.setErrorHandler(org.xml.sax.ErrorHandler)public void setTransitionHander(TransitionHandler handler)
public boolean isCheckingNormalization()
true if NFC normalization of source is being checked.nu.validator.htmlparser.impl.Tokenizer#isCheckingNormalization()public void setCheckingNormalization(boolean enable)
enable - true to check normalizationnu.validator.htmlparser.impl.Tokenizer#setCheckingNormalization(boolean)public void setCommentPolicy(XmlViolationPolicy commentPolicy)
commentPolicy - the policyTokenizer.setCommentPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentNonXmlCharPolicy(XmlViolationPolicy contentNonXmlCharPolicy)
contentNonXmlCharPolicy - the policyTokenizer.setContentNonXmlCharPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setContentSpacePolicy(XmlViolationPolicy contentSpacePolicy)
contentSpacePolicy - the policyTokenizer.setContentSpacePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public boolean isScriptingEnabled()
true if enabledTreeBuilder.isScriptingEnabled()public void setScriptingEnabled(boolean scriptingEnabled)
scriptingEnabled - true to enableTreeBuilder.setScriptingEnabled(boolean)public DoctypeExpectation getDoctypeExpectation()
public void setDoctypeExpectation(DoctypeExpectation doctypeExpectation)
doctypeExpectation - the doctypeExpectation to setTreeBuilder.setDoctypeExpectation(nu.validator.htmlparser.common.DoctypeExpectation)public DocumentModeHandler getDocumentModeHandler()
public void setDocumentModeHandler(DocumentModeHandler documentModeHandler)
documentModeHandler - the documentModeHandler to setTreeBuilder.setDocumentModeHandler(nu.validator.htmlparser.common.DocumentModeHandler)public XmlViolationPolicy getStreamabilityViolationPolicy()
public void setStreamabilityViolationPolicy(XmlViolationPolicy streamabilityViolationPolicy)
streamabilityViolationPolicy - the streamabilityViolationPolicy to setpublic void setHtml4ModeCompatibleWithXhtml1Schemata(boolean html4ModeCompatibleWithXhtml1Schemata)
html4ModeCompatibleWithXhtml1Schemata - public org.xml.sax.Locator getDocumentLocator()
Locator during parse.
Locatorpublic boolean isHtml4ModeCompatibleWithXhtml1Schemata()
public void setMappingLangToXmlLang(boolean mappingLangToXmlLang)
lang is mapped to xml:lang.
mappingLangToXmlLang - Tokenizer.setMappingLangToXmlLang(boolean)public boolean isMappingLangToXmlLang()
lang is mapped to xml:lang.
public void setXmlnsPolicy(XmlViolationPolicy xmlnsPolicy)
xmlns attribute on the root element is
passed to through. (FATAL not allowed.)
xmlnsPolicy - Tokenizer.setXmlnsPolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public XmlViolationPolicy getXmlnsPolicy()
public XmlViolationPolicy getCommentPolicy()
public XmlViolationPolicy getContentNonXmlCharPolicy()
public XmlViolationPolicy getContentSpacePolicy()
public void setReportingDoctype(boolean reportingDoctype)
reportingDoctype - TreeBuilder.setReportingDoctype(boolean)public boolean isReportingDoctype()
public void setNamePolicy(XmlViolationPolicy namePolicy)
namePolicy - Tokenizer.setNamePolicy(nu.validator.htmlparser.common.XmlViolationPolicy)public void setHeuristics(Heuristics heuristics)
heuristics - the heuristics to setnu.validator.htmlparser.impl.Tokenizer#setHeuristics(nu.validator.htmlparser.common.Heuristics)public Heuristics getHeuristics()
public void setXmlPolicy(XmlViolationPolicy xmlPolicy)
xmlPolicy - public XmlViolationPolicy getNamePolicy()
public void setBogusXmlnsPolicy(XmlViolationPolicy bogusXmlnsPolicy)
public XmlViolationPolicy getBogusXmlnsPolicy()
XmlViolationPolicy.ALTER_INFOSET.
XmlViolationPolicy.ALTER_INFOSETpublic void addCharacterHandler(CharacterHandler characterHandler)
public void setIgnoringComments(boolean ignoreComments)
ignoreComments - true to ignore commentsTreeBuilder.setIgnoringComments(boolean)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||