An Information-Theoretic Framework for Large Scale Data Analysis and Visualization


NSF Project 1017635/1017935


The growing power of supercomputers provides significant advancements to the scientists' capability to simulate more complex problems at greater fidelity, leading to high-impact scientific and engineering breakthroughs.  To fully understand the vast amounts of data, scientists need scalable solutions that can perform detailed data analysis at different levels of detail.  Over the years, visualization has become an important method to analyze data generated by a variety of computationally intensive applications. The selection of visualization parameters and identification of important features, however, are mostly done in an ad-hoc manner.  To enable the user to explore the data systematically and effectively, in this collaborative research effort involving the Ohio State University and the MichiganTechnological University, the PIs will develop an information-theoretical framework to evaluate the quality of visualization and guide the selection of algorithm parameters.

The PIs plan to develop a four-tier analysis framework based on information theory.  The bottom tier of the framework consists of the components of information measures where data are modeled as probability distributions.  Based on the information measurement components, in the tier two of the framework the most common visualization algorithms including isosurface extraction and flowline generation are evaluated and optimized to effectively reveal the most amount of information in the data.  The PIs will also investigate issues related to information measurement in image space and optimize the direct volume rendering results. The tier three of the framework is focused on the analysis of time-varying and multivariate data sets. Methods will be developed to identify important spatio-temporal regions in time-varying data sets, and to measure the information flow in multivariate data sets to identify the causal relationship among different variables.  In the fourth tier of the framework, the information theory is used to assess the quality of different levels of detail in multiresolution volumes and images, and to select the level of detail to optimize the visualization quality while satisfying the underlying performance constraints.

Broader Impact: The key accomplishment of this project will be the development of a rigorous information theory based solution to assist scientists in comprehending the vast amounts of data generated by large-scale simulations.  The four-tier information-theoretic framework will be implemented using the Visualization Toolkit (VTK), which is to be released to general users.  To target the research at real world applications, the PIs are collaborating with the combustion scientists at Sandia National Laboratories who are at the forefront of their field to employ extreme-scale computing to solve the most challenging problems.  This project provides training to graduate, undergraduate, and underrepresented students in the area of computational science and large-scale data analysis and visualization. New algorithms and techniques developed in the project will be disseminated through tutorials at the annual visualization and application-specific conferences that the PIs have been actively participating in. Our dissemination plan will also reach general audiences through news, stories, and presentations to enhance their understanding and appreciation of the value of visualization.


National Science Foundation

Project PIs:

Han-Wei Shen (The Ohio State University)

Chaoli Wang (Michigan Technological University)


Abon Chaudhuri (OSU)

Yi Gu (Michigan Tech)

Teng-Yok Lee (OSU) - graduated 2011

This material is based upon work supported by the National Science Foundation under Grant No. 1017635 and 1017935.

Any opinions, findings, and conclusions and recommendations expressed in this web site are of the PIs of this project and do not necessarily reflect the views of the National Science Foundation.