Indiana University seal
database group

Home
People
Research
Publications
  :Data Mapping & Integration
  :Data & Web Mining
  :Network Analysis
  :Information Retrieval
  :Query Languages
  :Query Processing
  :Spatial DB Theory
  :Metadata in Databases
  :Complex Object DBs
  :Modeling Information
  :Information Systems
  :Visualization
  :XML
  :Semi-Structured Data
  :Miscellaneous
  :Pedagogy
Courses
Miscellaneous

Indiana University Database Group


Publications


Data Mapping & Integration/Metadata
  • Data Mapping as Search. George H.L. Fletcher and Catharine M. Wyss. 10th Int. Conf. on Extending Database Technology (EDBT), Munich, Germany, 26-30 March 2006. Springer LNCS, to appear.
  • A Calculus for Data Mapping. George H.L. Fletcher, Catharine M. Wyss, Edward L. Robertson, and Dirk Van Gucht. Int. Workshop on Database Interoperability (InterDB), at the 7th Int. Conf. on Coordination Models and Languages (COORDINATION), Namur, Belgium, 23 April 2005. Elsevier ENTCS, to appear.
  • MIQIS: Modular Integration of Queryable Information Sources. Catharine M. Wyss, George H.L. Fletcher, Fulya Erdinc, and Jeremy T. Engle. Workshop on Information Integration on the Web (IIWeb), at the 30th Int. Conf. on Very Large Data Bases (VLDB), Toronto, Canada, 30 August 2004, pp. 136-140.
  • Triadic Relations: an Algebra for the Semantic Web. Edward L. Robertson. Semantic Web and Databases, C. Bussler, V. Tannen, and I. Fundulaki (eds), LNCS 3372, Springer Verlag, 2004, pp 91 -- 108. An updated, expanded version is available as An Algebra for Triadic Relations, Indiana University, Computer Science Department Technical Report 606.
Data & Web Mining, Network Analysis, Information Retrieval
  • On Approximation Measures for Functional Dependencies, C. Giannella and E. Robertson, Information Systems To appear, 2004.
  • Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams, C. Giannella, J. Han, E. Robertson, C. Liu. Computer Science Department Technical Report 587, Indiana University, Nov 2003. An older version: Mining Frequent Patterns in Data Streams at Multiple Time Granularities, C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu. Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT Press, H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), 2003.
  • A Note on Approximation Measures for Multi-valued Dependencies in Relational Databases, C. Giannella and E. Robertson, Information Processing Letters Volume 85, Issue 3 153-158, 2003.
  • Topical Web Crawlers: Evaluating Adaptive Algorithms, F. Menczer, G. Pant, P. Srinivasan. To appear in ACM Trans. on Internet Technologies, Download site
  • Search engine-crawler symbiosis: Adapting to Community Interests, G. Pant, S. Bradshaw, F. Menczer. Proc. ECDL 2003 Download site
  • Topical crawling for business intelligence, G. Pant, F. Menczer Proc. ECDL 2003 Download site
  • Defining Evaluation Methodologies for Topical Crawlers, P. Srinivasan, F. Menczer, G. Pant. Position paper, SIGIR 2003 Workshop on Defining Evaluation Methodologies for Terabyte-Scale Collections Download site
  • Complementing Search Engines with Online Web Mining Agents, F. Menczer. Decision Support Systems 35(2): 195-212, 2003 Download site
  • Crawling the Web, G. Pant, P. Srinivasan, F. Menczer. To appear in M. Levene and A. Poulovassilis, eds.: Web Dynamics, Springer, 2003 Download site
  • Feature Selection in Data Mining, Y.S. Kim, N. Street, F. Menczer. In J. Wang, ed.: Data Mining: Opportunities and Challenges, Idea Group Publishing, pp. 80-105, 2003 Download site
  • Growing and Navigating the Small World Web by Local Content, F. Menczer. Proc. Natl. Acad. Sci. USA 99(22): 14014-14019, 2002 Download site
  • Adaptive Assistants for Customized E-Shopping, F. Menczer, A. Monge, N. Street. IEEE Intelligent Systems 17(6): 12-19, Nov-Dec 2002 Download site
  • MySpiders: Evolve your own intelligent Web crawlers, G. Pant, F. Menczer. Autonomous Agents and Multi-Agent Systems 5(2): 221-229, 2002 Download site
  • Evolutionary model selection in unsupervised learning, Y.S. Kim, N. Street, F. Menczer. Intelligent Data Analysis 6(6): 531-556, 2002 Download site
  • IntelliShopper: A Proactive, Personal, Private Shopping Assistant, F. Menczer, N. Street, N. Vishwakarma, A. Monge, M. Jakobsson. Proc. 1st ACM Int. Joint Conf. on Autonomous Agents and MultiAgent Systems (AAMAS 2002) pp. 1001-1008 Download site
  • Web Crawling Agents for Retrieving Biomedical Information, P. Srinivasan, J. Mitchell, O. Bodenreider, G. Pant, F. Menczer. Proc. Int. Workshop on Agents in Bioinformatics (NETTAB 2002) Download site
  • Meta-Evolutionary Ensembles, Y.S. Kim, N. Street, F. Menczer. Proc. IEEE Intl. Joint Conf. on Neural Networks (IJCNN'02) Download site
  • Exploration versus Exploitation in Topic Driven Crawlers, G. Pant, P. Srinivasan, F. Menczer. Proc. WWW 2002 Workshop on Web Dynamics Download site
  • An Axiomatic Approach to Defining Approximation Measures for Functional Dependencies, Chris Giannella. Lecture Notes in Computer Science vol 2435 pg. 37-51(proceedings of the 6th East-European Conference on Advances in Databases and Information Systems), 2002.
  • Discovering Frequent Itemsets in the Presence of Highly Frequent Items, Dennis P. Groth and Edward L. Robertson. Workshop on Rule Based Data Mining, in Conjunction with the 14th International Conference On Applications of Prolog, 2001.
  • FastFDs: A Heuristic-Driven Depth-First Algorithm for Mining Functional Dependencies from Relation Instances. Cathy Wyss, Chris Giannella, and Edward Robertson, Proceedings of the 3rd International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2001), Munich, Germany, September 2001. Published in Lecture Notes in Computer Science 2112.
  • On an Information Theoretic Approximation Measure for Functional Dependencies. Chris Giannella and Edward Robertson, Indiana University, Computer Science Department Technical Report 555, Aug 2001.
  • Information Dependencies, Mehmet Dalkilic and Edward Robertson. Indiana University, Computer Science Department Technical Report 531, Nov 1999, also in ACM PODS, 2000.
  • Average Case Performance of the Apriori Algorithm, Paul Purdom and Dirk Van Gucht. Indiana University, Computer Science Department Technical Report 529, Oct 1999.
  • CE: The Classifier-Estimator Framework for Data Mining, Mehmet Dalkilic, Edward Robertson, and Dirk Van Gucht, Proceedings 7th IFIP 2.6 Working Conference on Database Semantics, Chapam & Hall, 1998. Full version available as Computer Science Department Technical Report 480, Indiana University, May 1997.
Query Languages and Processing Spatial Database Theory
  • On Adding a Connectedness Operator to FO+poly (linear), Chris Giannella and Dirk Van Gucht, Acta Informatica 38(9), pages 621-648, 2002 . An earlier (less polished) version appears as Computer Science Department Technical Report 530, Indiana University, 2000 Download.
  • Adding a Connectedness Operator to FO+poly -- Extended Abstract, Chris Giannella, Proceedings of the 2000 student session of the European summer school in logic, language, and information (ESSLLI2000) in Birmingham, England.
  • Complete geometrical query languages, M. Gyssens, J. Van den Bussche, D. Van Gucht. Journal of Computer and System Sciences , vol 58, no 3, pages 483-511, 1999. (A preliminary version was presented at PODS'97 .)
  • An Expressive Language for Linear Spatial Database Queries, L. Vandeurzen, M. Gyssens, D. Van Gucht, PODS'98
  • Genericity in Spatial Databases, Bart Kuijpers, Dirk Van Gucht, to appear in Constraint Databases (eds. G. Kuper, L. Libkin, and J. Paredaens), 1998
  • Towards a Theory of Movie Database Queries, Bart Kuijpers, Jan Paredaens, and Dirk Van Gucht, Technical Report University of Antwerp 98-02, 1997
  • On the Decidability of Semi-Linearity for Semi-Algebraic Sets and its Implications for Spatial Databases, F. Dumortier, M. Gyssens, L. Vandeurzen, D. Van Gucht, PODS'97
  • On Query Languages for Linear Queries Definable with Polynomial Constraints, L. Vandeurzen, M. Gyssens, and D. Van Gucht, Lecture Notes in Computer Science (Proceedings of Second International Conference on Principles and Practice of Constraint Programming, Cambridge, Massachusetts, USA, August 19-22, 1996), vol. 1118, Springer, 1996, pp. 468-481.
  • On the Desirability and Limitations of Linear Spatial Database Models, L. Vandeurzen, M. Gyssens, and D. Van Gucht, Lecture Notes in Computer Science (Proceedings of the 4th International Symposium on Large Spatial Databases (SSD'95)), M.J. Egenhofer and J.R. Herring, eds., vol. 951, Springer, 1995, pp. 14-28.
  • First-order queries on finite structures over the reals, Jan Paredaens, Jan Van den Bussche, Dirk Van Gucht, Logic In Computer Science , 79-89, 1995
  • Towards a Theory of Spatial Database Queries, Jan Paredaens, Jan Van den Bussche, Dirk Van Gucht, PODS'94 , 279-288, 1994
Metadata in Databases
  • Optimal Tuple Merge is NP-Complete. Edward L. Robertson and Catharine M. Wyss.
  • A Relational Algebra for Data/Metadata Integration in a Federated Database System. Cathy Wyss and Dirk Van Gucht, CIKM 2001, Atlanta, Georgia.
  • Augmenting SQL with Dynamic Typing to Support Interoperability in a Relational Federation. Cathy Wyss, Felix Wyss, and Dirk Van Gucht, EFIS 2001, Berlin, Germany.
  • MD-SQL: A Language for Meta-Data Queries over Relational Databases C. M. Rood, D. Van Gucht and F. I. Wyss. Indiana University, Computer Science Department Technical Report 528, Jul 1999
  • Typed query languages for databases containing queries, F. Neven, J. Van den Bussche, D. Van Gucht, G. Vossen. Information Systems , vol 24, no 7, pages 569-595, 1999. (A preliminary version was presented at PODS'98 .)
  • Design and Implementation of Reflective SQL Mehmet M. Dalkilic, Manoj Jain, Dirk Van Gucht, and Anurag Mendhekar. Indiana University, Computer Science Department, Technical Report 451, Feb 1996
  • Reflective Programming in the Relational Algebra, Jan Van den Bussche, Dirk Van Gucht, Gottfried Vossen, ACM PODS 1993
Complex Object Databases
  • On the completeness of object-creating database transformation languages, J. Van den Bussche, D. Van Gucht, M. Andries, M. Gyssens. Journal of the ACM , vol 44, no 2, pages 272-319, 1997. (A preliminary version was presented at FOCS'92 .)
  • A Polynomial-Time Query Language for Hierarchically Structured Documents Arijit Sengupta and Dirk Van Gucht, DRAFT, in preparation.
  • Query By Templates: A Generalized Approach for Visual Query Formulation for Text Dominated Databases, Arijit Sengupta and Andrew Dillon, Conference on Advanced Digital Libraries (ADL'97), 1996.
  • Standardizing the Querying Process with SGML: The SQL DTD(PostScript version), Arijit Sengupta. Tommie Usdin and Debbie Lapeyre, editors, Proceedings of the SGML'96 Conference. Graphic Communications Association, 1996. An SGML version is also available (of course), for those with SGML viewers.
  • Extending SGML to Accommodate Database Functions: A Methodological Overview, Arijit Sengupta and Andrew Dillon, Journal of the American Society of Information Systems (JASIS), special issue on structured information/standards for document architectures. August, 1996.
  • Demand More from Your SGML Database! Bringing SQL Under the SGML Limelight, Arijit Sengupta, <TAG>, April 1996.
  • Structured Document Databases, Arijit Sengupta, September 1996. Arijit's thesis proposal and project summary.
  • The expressive power of cardinality-bounded set values in object-based data models, J. Van den Bussche, D. Van Gucht. Theoretical Computer Science , vol 149, no 1, pages 49-66, 1995. (A preliminary version was presented at ICDT'92 ).
  • Expressiveness of efficient semi-deterministic choice constructs, M. Gyssens, J. Van den Bussche, D. Van Gucht. Automata, Languages and Programming - ICALP'94 (S. Abiteboul, E. Shamir, editors), Lecture Notes in Computer Science , vol 820, pages 106-117. Springer, 1994. (A full version presenting polynomial-time semi-deterministic choice constructs that are more general than swap-choice, is in preparation.)
  • Non-deterministic aspects of database transformations involving object creation, J. Van den Bussche, D. Van Gucht. Modeling Database Dynamics (U. Lipeck, B. Thalheim, editors), Workshops in Computing, pages 3-16. Springer, 1993.
Modeling Information and Information Systems Visualization XML & Semi-structured Data Management
  • ACXESS - Access Control for XML with Enhanced Security Specification. Sriram Mohan, Jonathan Klinginsmith, Arijit Sengupta, Yuqing Wu. Demo at ICDE 2006.
  • Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs. Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, Yuqing Wu. IEEE Transactions on Knowledge and Data Engineering. Dec. 2005. pp. 1595 -1609.
  • Access Control for XML - A Dynamic Query Rewriting Approach. Sriram Mohan, Arijit Sengupta, Yuqing Wu. ACM Conference on Information and Knowledge Management, 2005.
  • Conceptual Modeling for XML - A Myth or Reality. Sriram Mohan, Arijit Sengupta. In Zongmin Ma, ed. Database Modeling for Industrial Data Management: Emerging Technologies and Applications. Idea Group Inc. 2005.
  • DocBase - The INEX Evaluation Experience. Sriram Mohan, Arijit Sengupta. Initiative for the Evaluation of XML Retrieval (INEX) 2005. Glasgow, Scotland, Springer LNCS.
  • Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs. Zhiyuan Chen, Surajit Chaudhuri, Kyuseok Shim, Yuqing Wu. ICDE 2004
  • Tree Logical Classes for Efficient Evaluation of XQuery Stelios Paparizos, Yuqing Wu, Laks V.S. Lakshmanan, H.V. Jagadish. SIGMOD 2004.
  • TIMBER: A Native System for Querying XML. Stelios Paparizos, Shurug Al-Khalifa, Adriane Chapman, H.V. Jagadish, Laks V.S. Lakshmanan, Andrew Nierman, Jignesh M. Patel, Divesh Srivastava, Nuwee Wiwatwattana, Yuqing Wu and Cong Yu. SIGMOD (demo) 2003.
  • Structural Join Order Selection for XML Query Optimization. Yuqing Wu, Jignesh Patel and H.V. Jagadish, ICDE 2003.
  • XER - Extensible Entity Relationship Modeling. Arijit Sengupta, Sriram Mohan and Rahul Doshi in J. Harnad et al. Eds. Proceedings of the XML 2003 Conference. Philadelphia, PA, USA. December 8-12 2003
  • Concpetual Modeling for XML using XER - A Prototype Demonstration. Sriram Mohan, Arijit Sengupta. Workshop of Information Technology and Systems (WITS) 2003. Seattle, Washington
  • Using Histograms to Estimate Answer Size for XML Queries. Yuqing Wu, Jignesh Patel, H. V. Jagadish. Information Systems 28 (1-2): 33-59 (2003) -- Special Issue: Best Papers from EDBT 2002.
  • TIMBER: A Native XML Database.H. V. Jagadish, Shurug Al-Khalifa, Adriane Chapman, Laks V.S. Lakshmanan, Andrew Nierman, Stelios Paparizos, Jignesh M. Patel, Divesh Srivastava, Nuwee Wiwatwattana, Yuqing Wu and Cong Yu. VLDB Journal, Vol. 11, Issue 4 (2002).
  • COMMIX: Towards Effective Web Information Extraction, Integration and Query Answering. Tengjiao Wang, Shiwei Tang, Dongqing Yang, Jun Gao, Yuqing Wu, Jian Pei: SIGMOD (demo) 2002.
  • Estimating Answer Sizes for XML Queries. Yuqing Wu, Jignesh M. Patel and H.V.Jagadish. EDBT 2002.
  • Grouping in XML.Stelios Paparizos, Shurug Al-Khalifa, H. V. Jagadish, Laks V.S. Lakshmanan, Andrew Nierman, Divesh Srivastava and Yuqing Wu. EDBT Workshop on XML Data Management (XMLDM'02), Published in Springer-Verlag, Lecture Notes in Computer Science Vol.2490, 2002.
  • Structural Joins: A Primitive for Efficient XML Query Pattern Matching. Shurug Al-Khalifa, H. V. Jagadish, Nick Koudas, Jignesh M. Patel, Divesh Srivastava and Yuqing Wu. ICDE 2002.
Miscellaneous (Algorithms, Logic Programming, Logic in Database Theory)
  • Useful Transformations in Answer Set Programming. J.C. Nieves Sanchez, M. Osorio, and C. Giannella, Workshop on Answer Set Programming as part of AAAI 2001 Spring Symposium Series, March 26-28, Stanford CA, Technical Report SS-01-01, pg 146-152.
  • Polynomially orderable classes of structures, J. Van den Bussche, D. Van Gucht. Unpublished DRAFT
  • An Empirical Study of the 4-Valued Kripke Kleene Semantics and 4-Valued Well-Founded Semantics in Random Propositional Logic Programs, C. Giannella, J. Schlipf, Annals of Mathematics and Artificial Intelligence 25 (1999) 3,4, pg 275-309, ed. J. Dix, J. Lobo
  • An Empirical Study of the 3-Valued Kripke Kleene Semantics in Random Propositional Logic Programs, C. Giannella, J. Schlipf, Proceedings of the Logic Programming Track of the 7th International Workshop on Non-Monotonic Reasoning 1998, pg. 41-50, ed. J. Dix, J. Lobo
  • On the Complexity of Partitioning Sparse Matrix Representations, Jóhann P. Malmquist and Edward Robertson, BIT 1982. This deals with the partitioning of network databases in order to minimize interpage links. This is the paper that got Ed Robertson into databases.
Pedagogy
Computer Science on-line Technical Report collection.