Component-Level Dataflow Analysis
Problem
Interprocedural dataflow analysis is a form of static program
analysis that has been investigated and used widely in various software tools.
However, its widespread use in real-world tools is hindered by
several serious challenges. One of the central problems is the
underlying model of whole-program analysis for a homogeneous
program. Modern software systems present critical challenges for
this traditional model. In particular, systems often incorporate
reusable components. Whole-program analysis is based on the
implicit assumption that it is appropriate and desirable to analyze
the source code of the entire program as a single unit. However, this
assumption is clearly violated for software systems that are built
with reusable components:
-
Some program components may be available only in binary form, without
source code.
-
Analysis results
may be needed even when a whole program simply does not exist. For
example, when the developer of a component wants to use a dataflow
analysis for understanding, restructuring, or testing of her code, she
often does this without having all other components that will
eventually be combined to construct a complete application.
- For analysis running time and memory
usage, it is highly undesirable to reanalyze a component every time
this component is used as part of a new system. For example, a popular
library may be used in thousands of different applications, and
whole-program analysis requires reanalysis of this library from
scratch as part of each such application.
- Treating the
program as a homogeneous entity means that code changes in one
component typically require complete reanalysis of the entire
application.
- The running time of whole-program analysis
is often dominated by the analysis of the underlying large library
components. To achieve practical running times, analyses often
must employ approximations which typically reduce the precision and
usefulness of the analysis solution.
The essence of the problem is the following: whole-program
interprocedural dataflow analysis is often
impossible or inefficient for software systems that
employ reusable components. Thus, the real-world usefulness of
hundreds of existing analyses remains questionable. In many case these
analyses cannot be used at all. Even if they are possible, they have
to be relatively approximate and imprecise in order to scale for
industrial-sized software containing hundreds of thousands lines of
code in multiple components.
Solution
Our work proposes an alternative conceptual model of dataflow analysis
which we refer to as component-level analysis (CLA). While a
whole-program analysis takes as input the source code of a complete
program, a component-level analysis processes the source code of a
single program component, given some information about the environment
of this component. All of the deficiencies of whole-program analysis
that are listed above can be eliminated in this new model. Given this
approach, there are two major open questions:
-
What is the appropriate theoretical framework for designing CLA? In
particular, how should the traditional mathematical machinery used in
the design of whole-program analysis be adapted for CLA? We
defined an initial proposal for such an adaptation
[PhD02, FSE99]
followed by a more powerful approach
[CC08, PhD07, CC06, CBSE05].
This new formal framework provides the foundation for
CLA versions of many widely-used dataflow analyses.
- Given the theoretical model for CLA, how should specific
whole-program analyses be modified to fit the CLA model? For example,
how should researchers and tool builders define CLA versions of
popular analyses such as pointer analysis, side-effect analysis, data
dependence analysis, and constant propagation analysis? We
first defined CLA versions of points-to analysis and side-effect
analysis for C [CC01],
object reference analysis for Java [TSE04,
ICSE03],
and side-effect analysis for Java [ICSM04].
Our subsequent work
considered call graph construction, alias analysis,
and dependence analysis for Java programs [SOAP12,
ISSTA11,
CC08, PhD07].
This effort is funded by
an NSF CAREER award and IBM Eclipse Innovation award.
main page