edu.indiana.iucbrf.util
Class CaseDataFileReader

java.lang.Object
  extended by edu.indiana.iucbrf.util.CaseDataFileReader

public class CaseDataFileReader
extends java.lang.Object

This class reads a text file containing case data (such as a file obtained from the UCI machine learning database) and puts the data in a case base. .


Field Summary
protected  java.lang.String unknownFeatureValue
           
 
Constructor Summary
CaseDataFileReader()
          Creates a new instance of CaseDataFileReader
 
Method Summary
protected  void addFeatureToCollection(FeatureCollection fc, java.io.StreamTokenizer st, FeatureKey featureKey, Domain domain)
          Add a feature that was read into the collection.
 java.lang.String getUnknownFeatureValue()
          Get the String used for an unknown feature value.
 void readDataIntoCB(java.lang.String filename, Domain domain, CaseBase cb, FeatureKey[] featureKeyOrder, int[] indicesToIgnore, int[] solutionIndices, int titleIndex, double problemCount)
          Reads the given file and adds the cases into the given case base.
 void readDataIntoCB(java.lang.String filename, Domain domain, CaseBase cb, FeatureKey[] featureKeyOrder, int[] indicesToIgnore, int[] solutionIndices, int titleIndex, double problemCount, int numCasesToRead)
          Reads the given file and adds the cases into the given case base.
 void setUnknownFeatureValue(java.lang.String value)
          Set the String used for an unknown feature value.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

unknownFeatureValue

protected java.lang.String unknownFeatureValue
Constructor Detail

CaseDataFileReader

public CaseDataFileReader()
Creates a new instance of CaseDataFileReader

Method Detail

getUnknownFeatureValue

public java.lang.String getUnknownFeatureValue()
Get the String used for an unknown feature value.


setUnknownFeatureValue

public void setUnknownFeatureValue(java.lang.String value)
Set the String used for an unknown feature value.

Parameters:
value - The new unknown feature value.

readDataIntoCB

public void readDataIntoCB(java.lang.String filename,
                           Domain domain,
                           CaseBase cb,
                           FeatureKey[] featureKeyOrder,
                           int[] indicesToIgnore,
                           int[] solutionIndices,
                           int titleIndex,
                           double problemCount)
Reads the given file and adds the cases into the given case base. Assumes cases are delimited by line, and features by comma. All indices are 0-based.

Parameters:
featureKeyOrder - The order of featureKeys corresponding to (non-ignored) columns in the file. These keys can include both problem and solution feature keys.
indicesToIgnore - The indices of columns in the file that the reader should ignore (not attempt to make into a feature). These indices must be in increasing order.
solutionIndices - The indices of columns in the file that refer to solution features. All other indices (that are not ignored) refer to problem features.
problemCount - The number of problems the system has seen so far. This can be obtained from PerformanceMonitor.

readDataIntoCB

public void readDataIntoCB(java.lang.String filename,
                           Domain domain,
                           CaseBase cb,
                           FeatureKey[] featureKeyOrder,
                           int[] indicesToIgnore,
                           int[] solutionIndices,
                           int titleIndex,
                           double problemCount,
                           int numCasesToRead)
Reads the given file and adds the cases into the given case base. Assumes cases are delimited by line, and features by comma. All indices are 0-based.

Parameters:
featureKeyOrder - The order of featureKeys corresponding to (non-ignored) columns in the file. These keys can include both problem and solution feature keys.
indicesToIgnore - The indices of columns in the file that the reader should ignore (not attempt to make into a feature). These indices must be in increasing order.
solutionIndices - The indices of columns in the file that refer to solution features. All other indices (that are not ignored) refer to problem features.
problemCount - The number of problems the system has seen so far. This can be obtained from PerformanceMonitor.

addFeatureToCollection

protected void addFeatureToCollection(FeatureCollection fc,
                                      java.io.StreamTokenizer st,
                                      FeatureKey featureKey,
                                      Domain domain)
Add a feature that was read into the collection.

Parameters:
fc - The feature collection to add the read-in feature value to.
st - The stream tokenizer containing the read-in value.
featureKey - The key corresponding to the read-in value.
domain - The domain in use.