Areas of Research:
Data Mining, Parallel and Distributed Computing, and Systems
Broad Categorization of Projects:
Novel Data Mining and Data Preprocessing Algorithms:
- Structure Mining: The development of novel algorithms for frequent pattern
mining particularly in the context of mining structured data
(e.g. graphs, 3D structures, and XML data).
- Anomaly Detection: The design and development of novel anomaly detection algorithms.
- Data Preprocessing: The development of novel data preprocessing strategies that
address issues such as how to effectively use sampling in the context of data mining,
how to handle missing data, and how to discretize continuous attributes in an effective manner.
- We plan to design and evaluate a framework for mining cause-and-effect patterns drawing
inspiration from work in temporal logic. If successful, this framework can have a significant
impact in bioinformatics, security and other areas.
Bioinformatics:
- Structure Analysis, within the context of problems such as protein structure
analysis, and drug motif discovery.
- Shape modeling and mining, in the context of eye disease detection.
- Modeling temporal, mixed attribute datasets, in the context of identifying hepatoxicity
patterns as a function of drug intake.
- We plan to examine the use of graph mining techniques in the context of protein-protein
interaction graphs and examine the use of probabilistic and deterministic models for rational
design problems such as protein crystallization.
Scientific Data Analysis:
- Discovery and Visual Exploration of Scientific Data: Developing novel techniques that will enable us to
visualize and mine data produced by molecular dynamics simulations. We have examined the use
of spatial frequent structure analysis techniques, feature mining and classification algorithms
in this context.
- We plan to extend this work to track evolving patterns. We also
plan to examine and develop a similar set of techniques (as done for MD) for mining Computational
Fluid Dynamics datasets.
Scalable Data Mining Algorithms:
- Development of parallel, distributed and incremental algorithms for various mining tasks
including several of the ones listed above.
- Development of high performance mining algorithms on modern processors.
- We plan to extend this work to new domains and to evaluate the use of such techniques in the
context of adaptive mining of data streams. This work has applications in areas such as intrusion
detection and mining large scale simulation data in-vivo.
Parallel and Distributed Systems:
- Identify systems solutions for next generation data analysis centers that run adaptive
parallel mining algorithms operating on dynamic data. Specific issues that we plan to look at
include: storage services, particularly support for disk-based sampling and data placement strategies.
- Development caching services for such next generation data analysis centers, that target re-use
of data and previously mined information; and finally scheduling services, that target job admission
and scheduling on a tightly coupled parallel cluster.
- We plan to validate the proposed framework drawing on applications listed above.
Funding Acknowledgements:
- Our research is currently supported by several grants from the NSF
(CAREER, ITR/NGS, NGS, ACR/Software), a grant from the DOE (ECPI),
and a grant from Pfizer Inc. Other sources of support
include an NIH center scale grant and a research infrastructure
grant from the NSF.
Past research has been supported by NSF (DBI), and Ameritech/SBC.
|