fingerprint package


Package Contract

The fingerprint package provides a framework for creation of ferret and filter fingerprints based on features of representative pages in the space of pages.

The FingerprintID class provides a unique identifier for each ferret or filter fingerprint function created. It consists of a method to compare two fingerprint IDs to check if they are the same. The FingerprintFunction class consists of an ID and methods to check if a page is a candidate page for the space of pages based on the deciding function. These methods are implemented in subclasses of the FingerprintFunction class.

The FerretFingerprintFunction has a decidingFunction() method which, given a page, creates a Feature instance of the same type as the Feature with which this fingerprint is associated. The new Feature instance has its feature value computed for the new page. This newly created feature is then compared with the stored Feature this fingerprint is associated with. The similarity between the features is checked against the fingerprint's threshold. If it exceeds the threshold, the page is acceptable.

The FilterFingerprint function works similarly, except that it has a larger number of features. Each web page has all the types of Features created for it, all of which are compared against the representative Features stored with the fingerprint. If a majority of them have similarity measures greater than the threshold, the page is acceptable.

In addition, the FilterFingerprintFunction also has a method for determining whether it is a suitable filter to handle a particular page. This is done to prevent all filters from working on the page, which would waste time. A filter is determined to be suitable if the DocumentNode which was used as a seed site for the new page is sufficiently similar to the set of representative pages on which this FilterFingerprintFunction is based.

Package-Level CRC

Collaborators:
Ferrets, filters, and their advisors use classes in the fingerprint package. The mapmaker and webpage database also use some Fingerprint information.

Responsibilities:
Provide a framework for different kinds of fingerprint functions.

Class-Level CRCs

The fingerprint package contains the following classes:

* FingerprintID
* FingerprintFunction
* FerretFingerprintFunction
* FilterFingerprintFunction

Class FingerprintID

* Responsibilities:
Provide a framework for defining an ID which uniquely identifies a particular instance of a fingerprint function.
* Collaborators:
All fingerprint function classes
* Variables and Methods:
int id;
For now, we define the fingerprint ID as an integer, but it is encapsulated in this class so its type can be changed in the future if needed.
static int lastIdUsed;
boolean CompareID (FingerprintID otherID)
Compares the given fingerprint ID to this one to see if they are the same. In this case, it will just return (id == otherID.id)

Class FingerprintFunction

* Responsibilities:
Define basic data and methods which every fingerprint function must have.
* Collaborators:
None (this class is never instantiated)
* Variables and Methods:
FingerprintID fingerprintID;
int pagesApproved;
private int decidingFunction(Object page)
This is the actual fingerprint function. It is applied to a page and an integer which is the result of the application of the function on this page is returned. This is an abstract function at this point; it is implemented by each individual ferret or filter fingerprint function class.
public int isPageGood(Object page)
This applies the decidingFunction() method and compares the return value to the threshold to find out whether a page is good or not. This is an abstract method.

Class FerretFingerprintFunction extends FingerprintFunction

* Responsibilities:
Provide a distinguishing feature for this fingerprint
* Collaborators:
Ferret Advisor, Ferret.
* Variables and Methods:
documentNode seedSite;
Feature distinguishingFeature;
int threshold;
int timesUsed;
This class will also implement the methods isPageGood() and decidingFunction(). The isPageGood() method will compare the value returned by the decidingFunction() to the threshold and return the similarity value if it is greater, -1 otherwise. The decidingFunction() method will create a feature similar to the feature this fingerprint has. This new feature object, however, will have a value computed for the web page the ferret just got off the web or cache. The feature values are compared and a measure of similarity returned.

Class FilterFingerprintFunction extends FingerprintFunction

* Responsibilities:
Provide methods to check if a particular web page belongs to the particular part of the space of pages this filter is an expert on.
* Collaborators:
filter advisors, filters.
* Variables and Methods:
private documentNode [] pagesDerivedFrom;
int reliability;
Feature [] featuresUsed;
int [] thresholds;
public boolean isFilterSuitable (documentNode seedSite)
This compares the given seed site with pages which this filter was derived from to check if this is a suitable filter for deciding whether this page gets accepted or rejected. Uses methods provided by the web page database to determine if pages are similar.
This class also implements isPageGood() and decidingFunction(). The isPageGood() method calls the decidingFunction() method for each feature and compares the value returned to the threshold for that feature. If a majority of features are decided to have similarity values above their threshold, isPageGood() returns the similarity value, otherwise it returns -1. The decidingFunction() method is identical to that of the ferret fingerprint.

Message Interactions

Message Interactions with other packages
* with Advisor
FerretAdvisors and FilterAdvisors will create instances of the FerretFingerprintFunction and FilterFingerprintFunction.
* with Ferret
The Ferret will call the isPageGood() method.
* with Filter
The Filter will call the isFilterSuitable() and isPageGood().
Message Interactions between classes in the package
* FerretFingerprintFunction and FingerprintID
The ferret fingerprint function will create an instance of the fingerprint ID.
* FilterFingerprintFunction and FingerprintID
The filter fingerprint function will create an instance of the fingerprint ID.
last | | to sitemap | | up one level | | next