Department of Computer Science and Engineering

CSE 5243: Introduction to Data Mining

Autumn 2015, TTH 2:20-3:40, Journalism 270

Introduction to the knowledge discovery process, key data mining techniques, efficient high performance mining algorithms, exposure to applications of data mining (bioinformatics and intrusion detection).

- U/G 3

- Introduction
to Databases, Introduction to Algorithms, or grad standing or permission
of instructor

Office Hours and Locations:

- Mastery of
knowledge discovery process.
- Mastery
over key data mining techniques
- Familiarity
with underlying data structures and scalable implementations
- Familiarity
with applying said techniques on practical domains (e.g. bioinformatics
and intrusion detection).

- Introduction
to Data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006
- Data
Mining: Concepts and Techniques, J. Han & M. Kamber,
Morgan Kaufmann, 2006.
- Data Mining
Analysis and Concepts, M. Zaki and W. Meira (the authors have kindly made an online version
available): http://www.dataminingbook.info/uploads/book.pdf
- Mining of
Massive Datasets, J. Leskovec, A. Rajaraman and J. Ullman: http://infolab.stanford.edu/~ullman/mmds/book.pdf
- Data
Mining, Charu Aggarwal, Springer, 2015. Should
be available online off SpringerLink. (for access within OSU)

- Introduction
to the KDD process and basic statistics
- Frequent
Pattern algorithms: Association Rule Mining, Sequential Pattern Mining,
Mining frequent structures
- Classification
Algorithms: Decision Tree Classification, Naive Bayesian Classification, A
brief introduction to other classifiers
- Clustering
Algorithms Methods to cluster continuous data, Methods to cluster
categorical data
- Scalable
Data Mining algorithms and systems support, Parallel Algorithms, Database
Integration, Data Locality Issues (Embedded Topic, i.e. will be covered
where appropriate)
- Graph and
Network Algorithms
- Anomaly
Detection
- Applications:
Bioinformatics, Intrusion Detection (A brief overview).

Homework/Labs.60%

Midterm I:October 20^{th}(in class)20%

Final:December 11, 6-7:45PM in DL 011320%

· Introduction and Basic Statistical Concepts

· Data
and Data Preprocessing (some additional informal notes)

·
Sample Midterm from Autumn 2013 (will be worked
out in class – Solutions here
)

·
Minwise Hashing
(adapted from authors of MMD book)

·
Class Notes (November 20^{th}) (covered Partition I/O algorithm; and
Sampling basics)

·
A Gentle Introduction to Graphs and PageRank

·
Graph Sparsification
Lecture adapted from the
following paper

o
Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan. "Local
Graph Sparsification for Scalable Clustering",
in the Proceedings of SIGMOD '11.

o
(Alternate link: http://web.cse.ohio-state.edu/~ruan/papers/satuluri_sigmod11.pdf)

August
2015