Session 9: Data Modeling, Analysis, and Visualization

Katy Börner, Assistant Professor, SLIS, IUB <katy@indiana.edu>
Shashikant Penumarthy, SLIS, IUB <sprao@indiana.edu>

This session will introduce participants to data modeling, analysis and visualization using the InfoVis Cyberinfrastructure (IVC) Software Framework. The algorithms will be primarily networks-based, i.e. they deal with modeling, analysis and visualization of data that represents entities (e.g.: people) and their relationships (e.g.: social relations among people). A brief overview will be given about the software framework, which enables diverse kinds of analysis, modeling and visualization algorithms to be plugged-in with minimal work. This framework facilitates sharing of knowledge among algorithm developers, lay users and educators alike.


What is the IVC?

The IVC stands for the InfoVis Cyberinfrastructure (http://iv.slis.indiana.edu), a one-of-its-kind effort that aims to provide analysis, modeling and visualization algorithms, learning modules, large datasets, and compute resources, via a website. The IVC is aimed at both researchers as well educators. It enables researchers to experiment with new kinds of algorithms on diverse data sets, review each other's work at a fine-grained level (using the actual code) and share knowledge. It aims to serve educators by providing an easy-to-use software framework and datasets that they can use to teach analysis algorithms, modeling methods, visualization techniques and even diverse ways of interacting with data. Everything in the IVC is free for use by the public. In addition, all software made available through the IVC is open-source, i.e. it allows anyone to use the existing software, modify it and distribute it freely.

Architecture of the IVC Software Framework

Architecture of the IVC Software Framework
Architecture of the IVC Software Framework. Click to enlarge.

The IVC is a Plug-in based architecture meaning that it allows diverse kinds of software components to be plugged-in and unplugged as needed without significant effort. The idea is similar to using electrical appliances at home; one can plug-in any electrical appliance (such as an Iron, Hair-dryer or a Television) into the power socket and it works. This is because all these appliances know how to use that same power socket. The power-socket offers a standard interface to every appliance thus making it really simple to plug them in. In the same manner, the IVC core (shown as grey circle in the center) presents a common interface to different types of algorithms. Therefore, so long as an algorithm knows how to talk to the IVC core, it can be plugged-in to the IVC and made to work. All the algorithms shown on the left side in the figure are plugged-in this way.

Any algorithm that analyzes or visualizes data needs to be able to work with that data in a meaningful manner. The IVC provides algorithms with a variety of ways to look at data. Data can be stored as a simple text file on one's computer, it may be a database sitting on a server, or it may be located in another computer on the west coast of the US. Again, the IVC core presents a standard interface which algorithms can use to read data from a variety of sources. Hence using many different types of file formats becomes really simple. All the persisters shown on the right side in the figure exist for this purpose.

All this complexity is hidden underneath an easy-to-use Graphical User Interface or GUI (shown below the IVC core in the figure), which provides a point-and-click way of using these algorithms. Thus, the end-user is freed from worrying about the particulars of the software.


Using the IVC

Starting up the IVC

  1. Browse to the C:\ivc folder and start the IVC by double-clicking on the file named ivc.jar.
  2. The user interface of the IVC appears (see figure below):


    Click to enlarge

Modeling a Small-World Network

  1. In the menubar, click Modeling->General Networks->Watts Strogatz Small World.
  2. In the form that appears, type in the following values:
    1. Number of Nodes: 20
    2. Rewiring Probability: 0.5
    3. Degree of Nodes : 10
  3. Then click the button labeled 'Generate Graph'. (See figure below):


    Click to enlarge
  4. A blue button labeled 'Unnamed Model 1' appears in the pane on the right. Right click the button and rename it to 'Network'.
  5. Click Visualization->Spring Layout.

Analysis of the Small-World Network

  1. Click Analysis->Clustering->Betweenness Centrality.
  2. In the box next to the text that says 'Threshold', type in 0.01.
  3. Click the checkbox that says 'Normalize'.
  4. Click the button that says 'Cluster'.
  5. Another blue button labeled ' Unnamed Model 2' appears in the pane on the right. (See Figure below:)


    Click to enlarge
  6. Click Visualization->Spring Layout

You can see that many links of the generated graph have been removed. The analysis algorithm you just used (Betweenness Centrality) removes links of highest betweenness one by one until a certain threshold is reached. The betweenness of a link determines how important it is in the network. By the time the threshold is reached, the weakest links of the network have been removed. The nodes and remaining links that form small groups are called clusters . You can see the difference in the structure of the network before and after this analysis by comparing the two visualizations.

Radial Graph , BalloonGraph & Treemap

Radial Graph

  1. Click Visualization->Radial Graph
  2. A dialog box will popup asking for an input file. Browse to the sampledata/prefuse folder in C:\ivc and select 'terror.xml', then click 'Open'.
  3. A window opens showing the network of terrorists involved in the 9/11 attacks. (See figure below)

    Click to enlarge

Using such a visualization, the connections between people are clearly revealed. A similar network was used by the Bush administration to capture Saddam Hussein by following his social connections.

TreeMap Demo

  1. Click Visualization->TreeMap Demo.
  2. In the same manner as before, browse to the sampledata/prefuse folder in C:\ivc and select chitest.hdir, then click 'Open'.
  3. A window opens showing the different categories of entities (See figure below).
  4. At first glance this is really hard to make sense of. Move the mouse pointer around and you'll see that groups of things have been clumped together.

    Click to enlarge.

Now we will use a different visualization on the same data.

Balloon Graph

  1. Click Visualization->Balloon Graph.
  2. Open the same file as before (C:\ivc\sampledata\prefuse\chitest.hdir).
  3. A window opens showing a Balloon Graph of the same data.
  4. Click the balloons in the network to expand them and show subcategories.

    Click to enlarge.

Clearly, the Balloon Graph is easier to read than the TreeMap. However, the TreeMap gives you a global overview of everything - this is something you cannot get using the Balloon Graph.


IVC Learning Modules http://iv.slis.indiana.edu/lm/

The learning modules in the IVC aim to encourage the exploration, application, evaluation, and comparison of algorithms.


References

 


Shashikant Penumarthy & Katy Börner , SLIS, Indiana University
Last Modified November 4th, 2004