Package org.cyberneko.html
Class HTMLScanner.ContentScanner
java.lang.Object
org.cyberneko.html.HTMLScanner.ContentScanner
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
The primary HTML document scanner.
- Author:
- Andy Clark
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index) Adds location augmentations to the specified attribute.protected StringnextContent(int len) Reads the next characters WITHOUT impacting the buffer content up to current offset.booleanscan(boolean complete) Scan.protected booleanscanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) Scans a real attribute.protected booleanscanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) Scans an attribute, pseudo or real.protected voidScans a CDATA section.protected voidScans characters.protected voidScans a comment.protected voidScans an end element.protected booleanscanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) Scans markup content.protected voidscanPI()Scans a processing instruction.protected booleanscanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) Scans a pseudo attribute.protected StringscanStartElement(boolean[] empty) Scans a start element.
-
Constructor Details
-
ContentScanner
public ContentScanner()
-
-
Method Details
-
scan
Scan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
IOException- Thrown if I/O error occurs.
-
nextContent
Reads the next characters WITHOUT impacting the buffer content up to current offset.- Parameters:
len- the number of characters to read- Returns:
- the read string (length may be smaller if EOF is encountered)
- Throws:
IOException
-
scanCharacters
Scans characters.- Throws:
IOException
-
scanCDATA
Scans a CDATA section.- Throws:
IOException
-
scanComment
Scans a comment.- Throws:
IOException
-
scanMarkupContent
protected boolean scanMarkupContent(org.apache.xerces.util.XMLStringBuffer buffer, char cend) throws IOException Scans markup content.- Throws:
IOException
-
scanPI
Scans a processing instruction.- Throws:
IOException
-
scanStartElement
Scans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty) throws IOException Scans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Throws:
IOException
-
scanPseudoAttribute
protected boolean scanPseudoAttribute(org.apache.xerces.util.XMLAttributesImpl attributes) throws IOException Scans a pseudo attribute.- Parameters:
attributes- The list of attributes.- Throws:
IOException
-
scanAttribute
protected boolean scanAttribute(org.apache.xerces.util.XMLAttributesImpl attributes, boolean[] empty, char endc) throws IOException Scans an attribute, pseudo or real.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").endc- The end character that appears before the closing angle bracket ('>').- Throws:
IOException
-
addLocationItem
protected void addLocationItem(org.apache.xerces.xni.XMLAttributes attributes, int index) Adds location augmentations to the specified attribute. -
scanEndElement
Scans an end element.- Throws:
IOException
-