Technical Report TR677:
Building a Concept Hierarchy Using Frequent Tag Sequences

Jon Klinginsmith (IUB), Malika Mahoui (IUPUI), Josette Jones (IUPUI), Melanie Wu (IUB)
(Jun 2009), 7
[This paper has been submitted to the CIKM 2009 conference. We will not hear on acceptance until late July.]
Web sites that allow collaborative tagging of resources have become a commonplace development. As part of the second generation of applications available on the Web, these sites provide a tremendous amount of user-generated taxonomic information. However, information seekers are hindered by the lack of organization within these tags. To address this issue, several methods have been proposed for creating an organizational structure from the tags. Despite their benefits, the current methods do not directly represent an organization of concepts, as a concept is often composed of more than one tag. In this paper, we propose a new approach to generating a concept hierarchy from the user-generated tags. Exploiting the fact that users often express a concept over a set of sequential tags, we propose a two-step approach for generating a hierarchy of concepts. We first discover concepts through tag sequences with sufficient support. Using these concepts, we then calculate conditional probabilities to discover the existing hierarchical relationships. The key benefit of the hierarchy produced through our approach is that it is topic-based, as opposed to existing related work, which only produce hierarchies of tags. Our findings are illustrated on a domain-specific dataset of tags supplied by a popular collaborative tagging Web site.

