The Ohio State University
Department of Computer Science and Engineering

CSE 5243: Introduction to Data Mining
Autumn 2015, TTH 2:20-3:40, Journalism 270

http://www.cse.ohio-state.edu/~srini/674


Description

Introduction to the knowledge discovery process, key data mining techniques, efficient high performance mining algorithms, exposure to applications of data mining (bioinformatics and intrusion detection).

Level and Credits

Prerequisites

Instructors: Dr. Srinivasan Parthasarathy, DL 691, srini@cse.ohio-state.edu;

Teaching Assistant: Arjun Bakshi, BO 111, bakshi.11@osu.edu  


Office Hours and Locations: 

Srinivasan Parthasarathy TTH 4-5 @ DL691;

David Fuhry, MW-3:30-4:30 @DL674 (instructor of other section);

Arjun Bakshi: MWF 11-12PM @BO111 or by appointment;

Objectives

Texts (for reading, several free for OSU students)

Approximate Syllabus

Tentative Grading Plan (Subject to revision)

Homework/Labs.

60%

Midterm I:  October 20th (in class)

20%

Final:  December 11, 6-7:45PM in DL 0113

20%

Lecture Notes (note I will be using the blackboard liberally)

·  Introduction and Basic Statistical Concepts

· Data and Data Preprocessing   (some additional informal notes)

·         Classification

·         Sample Midterm from Autumn 2013 (will be worked out in class – Solutions here )

·         Clustering

·         Minwise Hashing (adapted from authors of MMD book)

·         Frequent Pattern Mining

·         Class Notes (November 20th)  (covered Partition I/O algorithm; and Sampling basics)

·         A Gentle Introduction to Graphs and PageRank

·         Graph Sparsification Lecture  adapted from the following paper

o    Venu Satuluri, Srinivasan Parthasarathy and Yiye Ruan. "Local Graph Sparsification for Scalable Clustering", in the Proceedings of SIGMOD '11.

o    (Alternate link: http://web.cse.ohio-state.edu/~ruan/papers/satuluri_sigmod11.pdf)

·         Outlier Detection Tutorial

 

Homework and Lab Assignments: (to be added during the quarter). Given the hands-on, problems assigned for this course project grading will be based on effort, novelty, of approach and clarity of analysis. Reports should be concise and to the point and bereft of spelling and grammatical errors. Also a site of interest in general for this class is kdnuggets.com . You can use any publicly available software for these assignments or choose to implement your own. 

Lab assignment 1: Due date September 13th 2015.  (some additional  tips and some answers to frequently asked questions (FAQ) by the TA and instructors).

Lab assignment 2: Due date October 1 2015.

Assignment 3: Due date October 14 2015.

Lab assignment 4: Due date November 5th 2015.

Lab assignment 5: Due date November 20th 2015. (Find referenced BayesLSH code here, and paper at the "PDF" link here.)

Assignment 6: Due date December 9th 2015 by 5PM. Please note -- no late submissions will be accepted. Assignments will need to be handed into the TA.

 

S. Parthasarathy

August 2015