The Ohio State
University
Department of Computer Science and Engineering
CSE 5243: Introduction to Data Mining
Autumn 2015, TTH 2:20-3:40, Journalism 270
http://www.cse.ohio-state.edu/~srini/674
Introduction to the knowledge discovery process, key data
mining techniques, efficient high performance mining algorithms, exposure to
applications of data mining (bioinformatics and intrusion detection).
Level
and Credits
Prerequisites
- Introduction
to Databases, Introduction to Algorithms, or grad standing or permission
of instructor
Instructors:
Dr. Srinivasan Parthasarathy, DL 691, srini@cse.ohio-state.edu;
Teaching
Assistant: Arjun Bakshi, BO 111, bakshi.11@osu.edu
Office Hours and Locations:
Srinivasan Parthasarathy TTH 4-5 @
DL691;
David Fuhry, MW-3:30-4:30 @DL674
(instructor of other section);
Arjun Bakshi: MWF 11-12PM @BO111
or by appointment;
Objectives
- Mastery of
knowledge discovery process.
- Mastery
over key data mining techniques
- Familiarity
with underlying data structures and scalable implementations
- Familiarity
with applying said techniques on practical domains (e.g. bioinformatics
and intrusion detection).
Texts
(for reading, several free for OSU students)
- Introduction
to Data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006
- Data
Mining: Concepts and Techniques, J. Han & M. Kamber,
Morgan Kaufmann, 2006.
- Data Mining
Analysis and Concepts, M. Zaki and W. Meira (the authors have kindly made an online version
available): http://www.dataminingbook.info/uploads/book.pdf
- Mining of
Massive Datasets, J. Leskovec, A. Rajaraman and J. Ullman: http://infolab.stanford.edu/~ullman/mmds/book.pdf
- Data
Mining, Charu Aggarwal, Springer, 2015. Should
be available online off SpringerLink. (for access within OSU)
- Introduction
to the KDD process and basic statistics
- Frequent
Pattern algorithms: Association Rule Mining, Sequential Pattern Mining,
Mining frequent structures
- Classification
Algorithms: Decision Tree Classification, Naive Bayesian Classification, A
brief introduction to other classifiers
- Clustering
Algorithms Methods to cluster continuous data, Methods to cluster
categorical data
- Scalable
Data Mining algorithms and systems support, Parallel Algorithms, Database
Integration, Data Locality Issues (Embedded Topic, i.e. will be covered
where appropriate)
- Graph and
Network Algorithms
- Anomaly
Detection
- Applications:
Bioinformatics, Intrusion Detection (A brief overview).
Tentative
Grading Plan (Subject to revision)
Homework/Labs.
|
60%
|
Midterm I: October 20th (in class)
|
20%
|
Final: December 11, 6-7:45PM in DL 0113
|
20%
|
Lecture
Notes (note I will be using the blackboard liberally)
· Introduction and Basic Statistical Concepts
· Data
and Data Preprocessing (some additional informal notes)
·
Classification
·
Sample Midterm from Autumn 2013 (will be worked
out in class – Solutions here
)
·
Clustering
·
Minwise Hashing
(adapted from authors of MMD book)
·
Frequent Pattern Mining
·
Class Notes (November 20th) (covered Partition I/O algorithm; and
Sampling basics)
·
A Gentle Introduction to Graphs and PageRank
·
Graph Sparsification
Lecture adapted from the
following paper
o
Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan. "Local
Graph Sparsification for Scalable Clustering",
in the Proceedings of SIGMOD '11.
o
(Alternate link: http://web.cse.ohio-state.edu/~ruan/papers/satuluri_sigmod11.pdf)
·
Outlier
Detection Tutorial
Homework and Lab
Assignments: (to be added during the quarter). Given the
hands-on, problems assigned for this course project grading will be based on
effort, novelty, of approach and clarity of analysis. Reports should be concise
and to the point and bereft of spelling and grammatical errors. Also a site of
interest in general for this class is kdnuggets.com
. You can use any publicly available software for these assignments or
choose to implement your own.
Lab assignment 5: Due date November 20th
2015. (Find referenced BayesLSH
code here, and paper at the "PDF" link here.)
Assignment 6: Due date
December 9th 2015 by 5PM. Please note -- no late submissions will be
accepted. Assignments will need to be handed into the TA.
S. Parthasarathy
August
2015