Laboratory Assignment 3

In this laboratory assignment, you will implement force directed layout using a focus and context (F+C).  The goal is to visualize large ÒgraphsÓ for a specific application.  There are two parts to this assignment. Allow me to describe the assignment.

Part A Ð The Plan: Due Friday, March 11, 2016

1.     Choose an application area. It could be business, or biology, social sciences, engineering, etc.

2.     Find a data source. We list four repositories below and look in there to find a dataset that you think will serve you well.

a.      The University of Florida Sparse Matrix Collection, is a large and actively growing set of sparse matrices that arise in real applications. Its matrices cover a wide spectrum of domains, include those arising from problems with underlying 2D or 3D geometry (as structural engineering, computational fluid dynamics, model reduction, electromagnetics, semiconductor devices, thermodynamics, materials, acoustics, computer graphics/vision, robotics/kinematics, and other discretizations) and those that typically do not have such geometry (optimization, circuit simulation, economic and financial modeling, theoretical and quantum chemistry, chemical process simulation, mathematics and statistics, power networks, and other networks and graphs).  Please visit the accompanying pages of that repository to learn of the various programs to extract the data and underlying graphs.

b.     The Stanford Network Analysis Project (SNAP) hosts the Stanford Large Network Dataset Collection of more than 50 large network datasets from tens of thousands of nodes and edges to tens of millions of nodes and edges. In includes social networks, web graphs, road, internet, citations, collaboration, and communication networks. Once again, please visit the accompanying pages to learn of the various programs to extract the data and underlying graphs.

c.     The GraphViz data repository at AT&T is another marvelous source. Please find many examples there.

d.     Please visit this data & resources pages made available by Prof. Tamara Munzner at University of British Columbia.

e.     And many others. You can essentially use the GEO (Gene Enrichment Ontology) for any of the systems biology datasets captured as micro-array experiments.

3.     When you choose a data set, choose one that can allow you to create graphs  of various data sizes.  For instance, I would like to see you tackle sizes of graphs that vary in numbers of nodes from 100(102), 1000(103), 10000(104), 1000000 (105) and so on. It is not the actual sizes but the scaling that I am really interested. I am certainly interested in ways you can deal with ÒhairballsÓ.

4.     Define tasks on the data, for instance,

a.     What do you wish to see in the data?

b.     Are you looking for ÒgroupsÓ, communities, or modules of interacting entities?

c.     Are you looking for central entities which interact or connected to all?

d.     Do you plan to inspect ÒpoorlyÓ connected nodes?

e.     And so on É.

5.     Choose the representation from the following list

a.     Explicit node-link

b.     Matrix

c.     Containment, spatial subdivision

6.      For any of the representation, do explain your rationale. Further, if there is a further sub-type (force directed layout, Sujiyama, etc.), explain why you chose that node-link layout (if you indeed choose the explicit node-link representation).

7.     Chart out the visual encoding; describe how will you map the data to

a.     Constants employed in models and algorithms (e.g., force directed layout)

b.     Mapping attributes to node channels like sizes/shapes and other visual attributes

c.     Mapping Òconnective strengthsÓ (correlations) to edge channels like thickness, color, etc.

d.     More thoughts - For instance in a dense graph, it will be useful to highlight nodes and edges in some regions.  How will you do that without increasing clutter? Think of a scheme which increases the sizes or opacity of nodes in the selected region while making other nodes smaller or less opaque. Also, use methods to encode "node density" appropriately to create effective visuals.   What optimizations can be realized for sparse graphs? I am not looking for very fancy solutions. One method is really enough.

8.     Now consider the Sarkar and Brown method to realize the fisheye as applied to graphs; describe how you will apply the method to your matrix representation in the form of node-link, etc. representation.  The paper by Sarkar and Brown published circa 1994 still one of the easier and better one to implement.  Please visit here.

9.     Given the fish-eye technique, explain how you will ÒrealizeÓ the tasks listed in Item 4 above. What data scraping and graph/matrix mining will you resort to? What modifications will you make to the lens of the Sarkar and Brown method.

10.  Describe the division of labor among all the team members. Describe what each team member will accomplish.

11.  Write an intermediate report.

Part B Ð The Implementaton: Due Monday, March 28, 2016

In this part, you will implement the tasks listed in Item 4 and realize the application that you set out to complete in Item 1. For some code examples, find them here. This segment will get updated soon. The code examples are:

  1. A D3 source code offering from on force directed layout from Michael  Bostock can be found here
  2. Please peruse the offering  here  from Hao Ding. It is a processing sketch that allows you to create responsive layouts. Place your cursor and the graph layout changes. The cursor "repulses" the nodes and the force directed layout algorithm rearranges the graph locally. Hao has a complete sketch for some special graphs (trees).  Your job is to make it work for real useful graphs. Processing is a another scripting language based on Java.
  3. A D3, Java script from Hao Ding can also be found here.
  4. Find example Processing sketch from Peter Lager at the OpenProcessing forum. Also, find this sketch at local repository here. Remember this is for images.
  5. The Actual Task 1 - Select small, medium and large and complex graphs as describe above.   Now the size should be determined by a combination of a measure that is a function of the number of nodes and the complexity or density of connections. Please use the suggestions and sources discussed above.
  6. The Actual Task 2 - Use the code snippets provided by Ding (or from Bostock) to construct a "responsive" script that lays out the general graph. Now, you should realize that there are few interactive tools that you can actually use.
  7. The Actual Task3 - Task 2 required the use of encoding techniques on edges and nodes. How about changing the lens? For inspiration look at the marvelous examples from Jeff Bostock using the D3 platform (example). The key is to use focus and context techniques of which the circular fisheye and cartesian distortion techniques belong to. Please Implement one of the techniques; either fisheye or Cartesian.
  8. A Bonus TASK -  Emulate this sketch which uses the Flare concept; this is an example of using summary/aggregation measures.
  9. Complete the report in item 11 of Part A.