Package org.cyberneko.html
Class HTMLConfiguration
java.lang.Object
org.apache.xerces.util.ParserConfigurationSettings
org.cyberneko.html.HTMLConfiguration
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponentManager,org.apache.xerces.xni.parser.XMLParserConfiguration,org.apache.xerces.xni.parser.XMLPullParserConfiguration
public class HTMLConfiguration
extends org.apache.xerces.util.ParserConfigurationSettings
implements org.apache.xerces.xni.parser.XMLPullParserConfiguration
An XNI-based parser configuration that can be used to parse HTML
documents. This configuration can be used directly in order to
parse HTML documents or can be used in conjunction with any XNI
based tools, such as the Xerces2 implementation.
This configuration recognizes the following features:
- http://cyberneko.org/html/features/augmentations
- http://cyberneko.org/html/features/report-errors
- http://cyberneko.org/html/features/report-errors/simple
- http://cyberneko.org/html/features/balance-tags
- and
- the features supported by the scanner and tag balancer components.
This configuration recognizes the following properties:
- http://cyberneko.org/html/properties/names/elems
- http://cyberneko.org/html/properties/names/attrs
- http://cyberneko.org/html/properties/filters
- http://cyberneko.org/html/properties/error-reporter
- and
- the properties supported by the scanner and tag balancer.
For complete usage information, refer to the documentation.
- Version:
- $Id: HTMLConfiguration.java,v 1.9 2005/02/14 03:56:54 andyc Exp $
- Author:
- Andy Clark
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected classDefines an error reporter for reporting HTML errors. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final StringInclude infoset augmentations.protected static final StringBalance tags.protected static final StringError domain.protected static final StringError reporter.protected booleanStream opened by parser.protected org.apache.xerces.xni.XMLDocumentHandlerDocument handler.protected final HTMLScannerDocument scanner.protected org.apache.xerces.xni.XMLDTDContentModelHandlerDTD content model handler.protected org.apache.xerces.xni.XMLDTDHandlerDTD handler.protected org.apache.xerces.xni.parser.XMLEntityResolverEntity resolver.protected org.apache.xerces.xni.parser.XMLErrorHandlerError handler.protected final HTMLErrorReporterError reporter.protected final VectorComponents.protected static final StringPipeline filters.protected LocaleLocale.protected final NamespaceBinderNamespace binder.protected final HTMLTagBalancerHTML tag balancer.protected static final StringModify HTML attribute names: { "upper", "lower", "default" }.protected static final StringModify HTML element names: { "upper", "lower", "default" }.protected static final StringNamespaces.protected static final StringReport errors.protected static final StringSimple report format.protected static booleanParser version is Xerces 2.0.0.protected static booleanParser version is Xerces 2.0.1.protected static booleanParser version is XML4J 4.0.x.Fields inherited from class org.apache.xerces.util.ParserConfigurationSettings
fFeatures, fParentSettings, fProperties, fRecognizedFeatures, fRecognizedProperties, PARSER_SETTINGS -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddComponent(HTMLComponent component) Adds a component.voidcleanup()If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing.protected HTMLScannervoidevaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g.org.apache.xerces.xni.XMLDocumentHandlerReturns the document handler.org.apache.xerces.xni.XMLDTDContentModelHandlerReturns the DTD content model handler.org.apache.xerces.xni.XMLDTDHandlerReturns the DTD handler.org.apache.xerces.xni.parser.XMLEntityResolverReturns the entity resolver.org.apache.xerces.xni.parser.XMLErrorHandlerReturns the error handler.Returns the locale.booleanparse(boolean complete) Parses the document in a pull parsing fashion.voidparse(org.apache.xerces.xni.parser.XMLInputSource source) Parses a document.voidpushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) Pushes an input source onto the current entity stack.protected voidreset()Resets the parser configuration.voidsetDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.voidsetDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler) Sets the DTD content model handler.voidsetDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler) Sets the DTD handler.voidsetEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver) Sets the entity resolver.voidsetErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler) Sets the error handler.voidsetFeature(String featureId, boolean state) Sets a feature.voidsetInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) Sets the input source for the document to parse.voidSets the locale.voidsetProperty(String propertyId, Object value) Sets a property.Methods inherited from class org.apache.xerces.util.ParserConfigurationSettings
addRecognizedFeatures, addRecognizedProperties, checkFeature, checkProperty, getFeature, getPropertyMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.xerces.xni.parser.XMLParserConfiguration
addRecognizedFeatures, addRecognizedProperties, getFeature, getProperty
-
Field Details
-
NAMESPACES
Namespaces.- See Also:
-
AUGMENTATIONS
Include infoset augmentations.- See Also:
-
REPORT_ERRORS
Report errors.- See Also:
-
SIMPLE_ERROR_FORMAT
Simple report format.- See Also:
-
BALANCE_TAGS
Balance tags.- See Also:
-
NAMES_ELEMS
Modify HTML element names: { "upper", "lower", "default" }.- See Also:
-
NAMES_ATTRS
Modify HTML attribute names: { "upper", "lower", "default" }.- See Also:
-
FILTERS
Pipeline filters.- See Also:
-
ERROR_REPORTER
Error reporter.- See Also:
-
ERROR_DOMAIN
Error domain.- See Also:
-
fDocumentHandler
protected org.apache.xerces.xni.XMLDocumentHandler fDocumentHandlerDocument handler. -
fDTDHandler
protected org.apache.xerces.xni.XMLDTDHandler fDTDHandlerDTD handler. -
fDTDContentModelHandler
protected org.apache.xerces.xni.XMLDTDContentModelHandler fDTDContentModelHandlerDTD content model handler. -
fErrorHandler
protected org.apache.xerces.xni.parser.XMLErrorHandler fErrorHandlerError handler. -
fEntityResolver
protected org.apache.xerces.xni.parser.XMLEntityResolver fEntityResolverEntity resolver. -
fLocale
Locale. -
fCloseStream
protected boolean fCloseStreamStream opened by parser. Therefore, must close stream manually upon termination of parsing. -
fHTMLComponents
Components. -
fDocumentScanner
Document scanner. -
fTagBalancer
HTML tag balancer. -
fNamespaceBinder
Namespace binder. -
fErrorReporter
Error reporter. -
XERCES_2_0_0
protected static boolean XERCES_2_0_0Parser version is Xerces 2.0.0. -
XERCES_2_0_1
protected static boolean XERCES_2_0_1Parser version is Xerces 2.0.1. -
XML4J_4_0_x
protected static boolean XML4J_4_0_xParser version is XML4J 4.0.x.
-
-
Constructor Details
-
HTMLConfiguration
public HTMLConfiguration()Default constructor.
-
-
Method Details
-
createDocumentScanner
-
pushInputSource
public void pushInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) Pushes an input source onto the current entity stack. This enables the scanner to transparently scan new content (e.g. the output written by an embedded script). At the end of the current entity, the scanner returns where it left off at the time this entity source was pushed.Hint: To use this feature to insert the output of <SCRIPT> tags, remember to buffer the entire output of the processed instructions before pushing a new input source. Otherwise, events may appear out of sequence.
- Parameters:
inputSource- The new input source to start scanning.- See Also:
-
evaluateInputSource
public void evaluateInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) EXPERIMENTAL: may change in next release
Immediately evaluates an input source and add the new content (e.g. the output written by an embedded script).- Parameters:
inputSource- The new input source to start scanning.- See Also:
-
setFeature
public void setFeature(String featureId, boolean state) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a feature.- Specified by:
setFeaturein interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration- Overrides:
setFeaturein classorg.apache.xerces.util.ParserConfigurationSettings- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setProperty
public void setProperty(String propertyId, Object value) throws org.apache.xerces.xni.parser.XMLConfigurationException Sets a property.- Specified by:
setPropertyin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration- Overrides:
setPropertyin classorg.apache.xerces.util.ParserConfigurationSettings- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-
setDocumentHandler
public void setDocumentHandler(org.apache.xerces.xni.XMLDocumentHandler handler) Sets the document handler.- Specified by:
setDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDocumentHandler
public org.apache.xerces.xni.XMLDocumentHandler getDocumentHandler()Returns the document handler.- Specified by:
getDocumentHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setDTDHandler
public void setDTDHandler(org.apache.xerces.xni.XMLDTDHandler handler) Sets the DTD handler.- Specified by:
setDTDHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDTDHandler
public org.apache.xerces.xni.XMLDTDHandler getDTDHandler()Returns the DTD handler.- Specified by:
getDTDHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setDTDContentModelHandler
public void setDTDContentModelHandler(org.apache.xerces.xni.XMLDTDContentModelHandler handler) Sets the DTD content model handler.- Specified by:
setDTDContentModelHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getDTDContentModelHandler
public org.apache.xerces.xni.XMLDTDContentModelHandler getDTDContentModelHandler()Returns the DTD content model handler.- Specified by:
getDTDContentModelHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setErrorHandler
public void setErrorHandler(org.apache.xerces.xni.parser.XMLErrorHandler handler) Sets the error handler.- Specified by:
setErrorHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getErrorHandler
public org.apache.xerces.xni.parser.XMLErrorHandler getErrorHandler()Returns the error handler.- Specified by:
getErrorHandlerin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setEntityResolver
public void setEntityResolver(org.apache.xerces.xni.parser.XMLEntityResolver resolver) Sets the entity resolver.- Specified by:
setEntityResolverin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getEntityResolver
public org.apache.xerces.xni.parser.XMLEntityResolver getEntityResolver()Returns the entity resolver.- Specified by:
getEntityResolverin interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
setLocale
Sets the locale.- Specified by:
setLocalein interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
getLocale
Returns the locale.- Specified by:
getLocalein interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration
-
parse
public void parse(org.apache.xerces.xni.parser.XMLInputSource source) throws org.apache.xerces.xni.XNIException, IOException Parses a document.- Specified by:
parsein interfaceorg.apache.xerces.xni.parser.XMLParserConfiguration- Throws:
org.apache.xerces.xni.XNIExceptionIOException
-
setInputSource
public void setInputSource(org.apache.xerces.xni.parser.XMLInputSource inputSource) throws org.apache.xerces.xni.parser.XMLConfigurationException, IOException Sets the input source for the document to parse.- Specified by:
setInputSourcein interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration- Parameters:
inputSource- The document's input source.- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException- Thrown if there is a configuration error when initializing the parser.IOException- Thrown on I/O error.- See Also:
-
parse
Parses the document in a pull parsing fashion.- Specified by:
parsein interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration- Parameters:
complete- True if the pull parser should parse the remaining document completely.- Returns:
- True if there is more document to parse.
- Throws:
org.apache.xerces.xni.XNIException- Any XNI exception, possibly wrapping another exception.IOException- An IO exception from the parser, possibly from a byte stream or character stream supplied by the parser.- See Also:
-
cleanup
public void cleanup()If the application decides to terminate parsing before the xml document is fully parsed, the application should call this method to free any resource allocated during parsing. For example, close all opened streams.- Specified by:
cleanupin interfaceorg.apache.xerces.xni.parser.XMLPullParserConfiguration
-
addComponent
Adds a component. -
reset
protected void reset() throws org.apache.xerces.xni.parser.XMLConfigurationExceptionResets the parser configuration.- Throws:
org.apache.xerces.xni.parser.XMLConfigurationException
-