A list of my publications is here.
Below is a list of research projects in which I have been involved.

  • AFD Mining
    Approximate Functional Dependencies (AFDs) are a probablistic extension of the traditional concept of Functional Dependencies. AFDs are the formalization of rules which are some measured "closeness" to being a Functional Dependency.
    Mined AFDs are used in a wide variety of applications. They have been shown useful in bootstrapping Bayesian classifiers, query optimizers, query answering, and other applications.
    • Framework Approach
      Heuristic Lozenge Search (HLS), is a framework developed for the mining of AFDs. A framework approach has the advantage of a static system by using independent components which can be altered or replaced in order to tune the mining process. The goal of this flexibility is to create a mining system which can meet the needs of an application instead of the application having to meet the limits of a static mining system. HLS can provide improved efficiency and effectiveness over the state of the art algorithms.
    • Algorithm
      The Lozenge Search algorithm is a template search algorithm. It provides a strategy for searching and then allows traditional search algorithms, such as breadth or depth first search, to be plugged in to provide how a lozenge is to be navigated.
      Lozenge Search is an iterative algorithm which searchs a lozenge in each iteration. A lozenge is the concept of the space of rules which are the result of adding a previously unseen attribute to an already searched space.
    • Effectiveness
      HLS not only provides new levels of efficiency, but also incorporates new abilities directly in the mining process. The two most prominent abilities are dynamic pruning of attributes from the search space and the ability to order the results.
    • Decisions
      HLS allows tunability by using points in Lozenge Search at which decisions can be made about different variations.

  • Schema Matching
    I am working with Cognitive Psychologist, Rob Goldstone to adapt his ABSURDIST system to the schema matching problem. ABSURDIST is a concept mapping system which uses graph matching to incorporate inter and intra system information for concept mapping. The project poses a number of questions. The first is how can information about the similarity of two concepts/attributes be mined, weighted, aggregated, and incorporated into ABSURDIST. The second question is what information can be mined and then used to transform data in various formats into graphs. The last aspect of the project is how and to what extent can solutions to these questions be done in a fully automated manner.

  • Machine Learning
    Multiple Instance Learning in Cheminformatics

    Over the sumemr of 2007, I did an RAship with Rajarshi Guha. I implemented the EM algorithm, an iterative global optimization algorithm which uses linear regression, and worked on improvements. We implemented two new features. The first was seeding the algorithm with a model based on aggregate data values in order to provide more consistent results then when seeding with a model based on random values. The second improvement was to use noise to help avoid local minimums.



Contact: Email me    Powered by Free Site Templates