Home News The Lab Publications Contact

CSE 5243 – Data Mining

CSE 5243 is offered under the auspices of the Department of Computer Science and Engineering, The Ohio State University. It is an elective course and will serve all those interested and enthusiastic about data mining and data analytics. 

 

Course Description

Knowledge discovery, data mining, data preprocessing, data transformations; clustering, classification, frequent pattern mining, anomaly detection, graph and network analysis; applications.

 

Time/Venue:

Class Number

Location

Venue

35059

JR0270

TR 3:55-5:15 PM

35060 (M)

JR0270

TR 3:55-5:15 PM

 

Course Contents

Topic

Introduction to the Knowledge Discovery Process and Background

Elements of Data Preprocessing and Data Transformations

Data Clustering

Data Classification

Frequent Pattern and Association Mining

Analyzing Graphs and Networks

Anomaly Detection

Applications (Bioinformatics, Social Networks)

 

Prerequisites

CSE 3241 or 5241, and CSE 2331, 5331, Stat 3301, or ISE 3200.

Coursework in Numerical Methods/Linear Algebra/Statistics; Data Structures; Programming fluidity required.

Academic Integrity Policy

Academic integrity is essential to maintaining an environment that fosters excellence in teaching, research, and other educational and scholarly activities. Thus, The Ohio State University and the Committee on Academic Misconduct (COAM) expect that all students have read and understand the University’s Code of Student Conduct, and that all students will complete all academic and scholarly assignments with fairness and honesty. Students must recognize that failure to follow the rules and guidelines established in the University’s Code of Student Conduct and this syllabus may constitute “Academic Misconduct.” For more info, click here.

 

The Text Book

-        Introduction to data mining. Tan, Pang-Ning, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. 2019. 

-        Learning Data Mining with Python (Safari), Robert Layton, 2017.

-        Jupyter for Data Science, Dan Toomey, 2017.

 

Reference Text Books

-         Data Mining: Concepts and Techniques (Safari), Jiawei Han, Micheline Kamber, and Jian Pei. 2011.

-        Data Mining Analysis and Concepts, Mohammed J. Zaki and Wagner Meira, Jr., Online Version.

-        Mining of Massive Datasets , Jure Leskovec, Anand Rajaraman and Jeffrey Ullman, Online Version.

-        Machine Learning, Tom Mitchell, 1997.

-        Pattern Recognition and Machine Learning, Christopher M. Bishop, 2006.

-        Neural Networks and Deep Learning, Michael Nielsen, Online version.

-        Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016.

-        An Introduction to Statistical Learning: with Applications in R. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2014.

 

Other Reference Material

-        Data Analysis with Open Source Tools, Philipp K. Janert, O’Reilly 2010

-        Think Stats, Allen B. Downey, O’Reilly, 2014

-        Visualization Analysis and Design, Tamara Munzner, CRC Press, 2014

 

Data

Dataquest Data Repository List - https://www.dataquest.io/blog/free-datasets-for-projects/

KDnuggets -

1.     https://www.kdnuggets.com/datasets/index.html

2.     https://www.kdnuggets.com/faq/datasets-for-data-mining.html

Data Driven - https://www.drivendata.org/

Data World - https://data.world/community/open-community/data-partners/

University of Edinburgh Data Sets –

http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html

 

Instructor

Raghu Machiraju, Ph.D.

Professor - Departments of Bioinformatics , Computer Science & Engineering and Pathology.

Principal Data Scientist, Translational Data Analytics Institute.

 

Grading Assistant

Chaitanya Kulkarni, BS, MS.

Department of Computer Science and Engineering, kulkarni dot 132 at buckeyemail dot osu dot edu.

 

 

Office Hours

Instructor- TR 1:00-2:00 PM, DL 779.

Grader: Chaitanya Kulkarni - Mon 12:00-1:00 PM, BE406.

 

Grade Distribution:

Participation: 5%; Laboratory Assignments: 40%; Quizzes:10%;Midterm: 20%, Final Project: 25%

 

Class Help/Watering Hole:

https://piazza.com/osu/autumn2016/cse5544/home


The Schedule & Lectures

Week 1

1/7

Rubrics, Case Studies

1/9

Introduction

Text Ch, 1

Week 2

1/14

Data: Types & Characteristics

1/16

Data: Statistics & Math

Text Ch, 2

Week 3

1/21

Data: Preprocessing

1/23

Data: Visualization; Workflows

Text Ch, 2

Week 4

1/28

Classification: Basic

1/30

Classification: Basic

Text Ch, 3

Week 5

2/4

Classification: Basic

2/6

Classification: Advanced

Text Ch, 3-4

Week 6

2/11

Classification: Advanced

2/13

Classification: Advanced

Text Ch, 4

Week 7

2/18

Classification: Advanced

2//20

Classification: Advanced

Text Ch, 4

Week 8

2/25

Clustering: Basic

2/27

Clustering: Basic

Text Ch, 7

Week 9

3/3

Clustering: Basic

3/5

Midterm

Text Ch, 7

Week 10

3/10

Spring Break

3/12

Spring Break

Head South

Week 11

3/17

Clustering: Advanced

3/19

Clustering: Advanced

Text Ch, 8

Week 12

3/24

Association Mining: Basic

3/26

Association Mining: Basic

Text, Ch 5

Week 13

3/31

Association Mining: Basic

4/2

Association Mining: Basic

4/5, Lab 4 due

Week 14

4/7

Association Mining: Advanced

4/9

Graphs and Networks

Project Proposal

Week 15

4/14

Graphs Networks

4/16

Anamoly Detection

Last Week


Labs:

-        Laboratory1: Due XX,YY : Data Pre-processing.

-        Laboratory2: Due XX,YY: Clustering

-        Laboratory3: Due XX,YY: Classification

-        Laboratory4: Due XX,YY: Frequent Pattern Mining

 

Midterm:

 TBA

Final Project

 TBA