org.apache.james.smtpserver.urirbl
Class URIScanner

java.lang.Object
  extended by org.apache.james.smtpserver.urirbl.URIScanner

public class URIScanner
extends java.lang.Object


Constructor Summary
URIScanner()
           
 
Method Summary
protected static java.lang.String domainFromHost(java.lang.String host)
          Extracts and returns the registrar domain portion of a host string.
protected static java.lang.String hostFromUriStr(java.lang.String uriStr)
          Extracts and returns the host portion of URI string.
static java.util.HashSet scanContentForDomains(java.util.HashSet domains, java.lang.CharSequence content)
          Scans a character sequence for URIs.
protected static java.util.HashSet scanContentForHosts(java.lang.CharSequence content)
          Scans a character sequence for URIs.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URIScanner

public URIScanner()
Method Detail

scanContentForDomains

public static java.util.HashSet scanContentForDomains(java.util.HashSet domains,
                                                      java.lang.CharSequence content)
Scans a character sequence for URIs. Then add all unique domain strings derived from those found URIs to the supplied HashSet.

This function calls scanContentForHosts() to grab all the host strings. Then it calls domainFromHost() on each host string found to distill them to their basic "registrar" domains.

Parameters:
domains - a HashSet to be populated with all domain strings found in the content
content - a character sequence to be scanned for URIs
Returns:
newDomains the domains which were extracted

scanContentForHosts

protected static java.util.HashSet scanContentForHosts(java.lang.CharSequence content)
Scans a character sequence for URIs. Then returns all unique host strings derived from those found URIs in a HashSet

Parameters:
content - a character sequence to be scanned for URIs
Returns:
a HashSet containing host strings

hostFromUriStr

protected static java.lang.String hostFromUriStr(java.lang.String uriStr)
Extracts and returns the host portion of URI string. This function uses java.net.URI.

Parameters:
uriStr - a string containing a URI
Returns:
the host portion of the supplied URI, null if no host string could be found

domainFromHost

protected static java.lang.String domainFromHost(java.lang.String host)
Extracts and returns the registrar domain portion of a host string. This funtion checks all known multi-part TLDs to make sure that registrar domain is complete. For example, if the supplied host string is "subdomain.example.co.uk", the TLD is "co.uk" and not "uk". Therefore, the correct registrar domain is not "co.uk", but "example.co.uk". If the domain string is an IP address, then the octets are returned in reverse order.

Parameters:
host - a string containing a host name
Returns:
the registrar domain portion of the supplied host string


Copyright ? 2002-2009 The Apache Software Foundation. All Rights Reserved.