Modifier and Type | Field and Description |
---|---|
protected WebcrawlerConnector.DocumentURLFilter |
WebcrawlerConnector.ProcessActivityLinkHandler.filter |
Modifier and Type | Method and Description |
---|---|
protected String |
WebcrawlerConnector.doCanonicalization(WebcrawlerConnector.DocumentURLFilter filter,
WebURL url)
Code to canonicalize a URL.
|
protected boolean |
WebcrawlerConnector.extractLinks(String documentIdentifier,
IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Code to extract links from an already-fetched document.
|
protected String |
WebcrawlerConnector.makeDocumentIdentifier(String parentIdentifier,
String rawURL,
WebcrawlerConnector.DocumentURLFilter filter)
Convert an absolute or relative URL to a document identifier.
|
protected void |
WebcrawlerConnector.processDocument(IProcessActivity activities,
String documentIdentifier,
String versionString,
boolean indexDocument,
Map<String,Set<String>> metaHash,
Map<String,Set<String>> metaHash2,
String[] acls,
WebcrawlerConnector.DocumentURLFilter filter) |
Constructor and Description |
---|
WebcrawlerConnector.ProcessActivityHTMLHandler(String documentIdentifier,
IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Constructor.
|
WebcrawlerConnector.ProcessActivityLinkHandler(String documentIdentifier,
IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter,
String contextDescription,
String linkType)
Constructor.
|
WebcrawlerConnector.ProcessActivityRedirectionHandler(String documentIdentifier,
IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Constructor.
|
WebcrawlerConnector.ProcessActivityXMLHandler(String documentIdentifier,
IProcessActivity activities,
WebcrawlerConnector.DocumentURLFilter filter)
Constructor.
|