Class ElementRemover
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent,org.apache.xerces.xni.parser.XMLDocumentFilter,org.apache.xerces.xni.parser.XMLDocumentSource,org.apache.xerces.xni.XMLDocumentHandler,HTMLComponent
- specifying those elements which should be accepted and, optionally, which attributes of that element should be kept; and
- specifying those elements whose tags and content should be completely removed from the event stream.
The first option allows the application to specify which elements appearing in the event stream should be accepted and, therefore, passed on to the next stage in the pipeline. All elements not in the list of acceptable elements have their start and end tags stripped from the event stream unless those elements appear in the list of elements to be removed.
The second option allows the application to specify which elements should be completely removed from the event stream. When an element appears that is to be removed, the element's start and end tag as well as all of that element's content is removed from the event stream.
A common use of this filter would be to only allow rich-text and linking elements as well as the character content to pass through the filter — all other elements would be stripped. The following code shows how to configure this filter to perform this task:
ElementRemover remover = new ElementRemover();
remover.acceptElement("b", null);
remover.acceptElement("i", null);
remover.acceptElement("u", null);
remover.acceptElement("a", new String[] { "href" });
However, this would still allow the text content of other
elements to pass through, which may not be desirable. In order
to further "clean" the input, the removeElement
option can be used. The following piece of code adds the ability
to completely remove any <SCRIPT> tags and content
from the stream.
remover.removeElement("script");
Note:
All text and accepted element children of a stripped element is
retained. To completely remove an element's content, use the
removeElement method.
Note: Care should be taken when using this filter because the output may not be a well-balanced tree. Specifically, if the application removes the <HTML> element (with or without retaining its children), the resulting document event stream will no longer be well-formed.
- Version:
- $Id: ElementRemover.java,v 1.5 2005/02/14 03:56:54 andyc Exp $
- Author:
- Andy Clark
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected HashtableAccepted elements.protected intThe element depth.protected intThe element depth at element removal.protected HashtableRemoved elements.protected static final ObjectA "null" object.Fields inherited from class org.cyberneko.html.filters.DefaultFilter
fDocumentHandler, fDocumentSource -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidacceptElement(String element, String[] attributes) Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.voidcharacters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Characters.voidcomment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Comment.protected booleanelementAccepted(String element) Returns true if the specified element is accepted.protected booleanelementRemoved(String element) Returns true if the specified element should be removed.voidemptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) Empty element.voidendCDATA(org.apache.xerces.xni.Augmentations augs) End CDATA section.voidendElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) End element.voidendGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) End general entity.voidendPrefixMapping(String prefix, org.apache.xerces.xni.Augmentations augs) End prefix mapping.protected booleanhandleOpenTag(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes) Handles an open tag.voidignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) Ignorable whitespace.voidprocessingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) Processing instruction.voidremoveElement(String element) Specifies that the given element should be completely removed.voidstartCDATA(org.apache.xerces.xni.Augmentations augs) Start CDATA section.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.Augmentations augs) Start document.voidstartDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) Start document.voidstartElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) Start element.voidstartGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier id, String encoding, org.apache.xerces.xni.Augmentations augs) Start general entity.voidstartPrefixMapping(String prefix, String uri, org.apache.xerces.xni.Augmentations augs) Start prefix mapping.voidText declaration.Methods inherited from class org.cyberneko.html.filters.DefaultFilter
doctypeDecl, endDocument, getDocumentHandler, getDocumentSource, getFeatureDefault, getPropertyDefault, getRecognizedFeatures, getRecognizedProperties, merge, reset, setDocumentHandler, setDocumentSource, setFeature, setProperty, xmlDecl
-
Field Details
-
NULL
A "null" object. -
fAcceptedElements
Accepted elements. -
fRemovedElements
Removed elements. -
fElementDepth
protected int fElementDepthThe element depth. -
fRemovalElementDepth
protected int fRemovalElementDepthThe element depth at element removal.
-
-
Constructor Details
-
ElementRemover
public ElementRemover()
-
-
Method Details
-
acceptElement
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.- Parameters:
element- The element to accept.attributes- The list of attributes to be kept or null if no attributes should be kept for this element. see #removeElement
-
removeElement
Specifies that the given element should be completely removed. If an element is encountered during processing that is on the remove list, the element's start and end tags as well as all of content contained within the element will be removed from the processing stream.- Parameters:
element- The element to completely remove.
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start document.- Specified by:
startDocumentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
startDocumentin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start document.- Overrides:
startDocumentin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
startPrefixMapping
public void startPrefixMapping(String prefix, String uri, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start prefix mapping.- Overrides:
startPrefixMappingin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
startElement
public void startElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start element.- Specified by:
startElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
startElementin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Empty element.- Specified by:
emptyElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
emptyElementin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
comment
public void comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Comment.- Specified by:
commentin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
commentin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
processingInstruction
public void processingInstruction(String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Processing instruction.- Specified by:
processingInstructionin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
processingInstructionin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
characters
public void characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Characters.- Specified by:
charactersin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
charactersin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Ignorable whitespace.- Specified by:
ignorableWhitespacein interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
ignorableWhitespacein classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
startGeneralEntity
public void startGeneralEntity(String name, org.apache.xerces.xni.XMLResourceIdentifier id, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start general entity.- Specified by:
startGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
startGeneralEntityin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
textDecl
public void textDecl(String version, String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Text declaration.- Specified by:
textDeclin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
textDeclin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
endGeneralEntity
public void endGeneralEntity(String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End general entity.- Specified by:
endGeneralEntityin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
endGeneralEntityin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException Start CDATA section.- Specified by:
startCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
startCDATAin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End CDATA section.- Specified by:
endCDATAin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
endCDATAin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
endElement
public void endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End element.- Specified by:
endElementin interfaceorg.apache.xerces.xni.XMLDocumentHandler- Overrides:
endElementin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
endPrefixMapping
public void endPrefixMapping(String prefix, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException End prefix mapping.- Overrides:
endPrefixMappingin classDefaultFilter- Throws:
org.apache.xerces.xni.XNIException
-
elementAccepted
Returns true if the specified element is accepted. -
elementRemoved
Returns true if the specified element should be removed. -
handleOpenTag
protected boolean handleOpenTag(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes) Handles an open tag.
-