Component-Level Dataflow Analysis

Problem

Interprocedural dataflow analysis is a form of static program analysis that has been investigated and used widely in various software tools. However, its widespread use in real-world tools is hindered by several serious challenges. One of the central problems is the underlying model of whole-program analysis for a homogeneous program. Modern software systems present critical challenges for this traditional model. In particular, systems often incorporate reusable components. Whole-program analysis is based on the implicit assumption that it is appropriate and desirable to analyze the source code of the entire program as a single unit. However, this assumption is clearly violated for software systems that are built with reusable components: The essence of the problem is the following: whole-program interprocedural dataflow analysis is often impossible or inefficient for software systems that employ reusable components. Thus, the real-world usefulness of hundreds of existing analyses remains questionable. In many case these analyses cannot be used at all. Even if they are possible, they have to be relatively approximate and imprecise in order to scale for industrial-sized software containing hundreds of thousands lines of code in multiple components.

Solution

Our work proposes an alternative conceptual model of dataflow analysis which we refer to as component-level analysis (CLA). While a whole-program analysis takes as input the source code of a complete program, a component-level analysis processes the source code of a single program component, given some information about the environment of this component. All of the deficiencies of whole-program analysis that are listed above can be eliminated in this new model. Given this approach, there are two major open questions: This effort is funded by an NSF CAREER award and IBM Eclipse Innovation award.


main page