CSE 5249: Data Analytics Seminar: Spring 2017
Instructor: Srinivasan Parthasarathy
Office: DL 693
Class Hours: TR 11:30-12:30
Office Hours: W by appointment
With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. Data mining is the process of automatic discovery of patterns, changes, associations, sequences and anomalies in massive databases. This research seminar will survey the main topics in data mining and knowledge discovery as they relate to data stored in the forms of graphs and networks. Topics will be from among: mining structured and semi-structured data, adaptation of classical techniques:-classification, clustering, association rules, sequence similarity for such problems; high performance implementation issues:-parallel/distributed data mining; visualization of network and graph data and application domains such as web mining, scientific simulations, e-commerce and bioinformatics.
There are no pre-requisites as such. However, it is desired that students will have had experience with at least one of the following courses: database systems (CIS 670), statistics (500/600 level course) parallel computing (CIS 720). The ability to program in C/C++, and/or with statistical programming packages, and work on semester-long (team/individual) projects is expected.
· Not Applicable (will focus this semester on paper readings)
Class Format and Requirements
The class will be a mix of student presentations, paper discussions and a research-oriented project. Generally, one of you will introduce a topic, and then we'll discuss some of the latest work on that topic. You will have to explain and defend what the paper says, as well as present weaknesses and shortcomings as you see fit. The rest of the class will be expected to contribute to the discussion as well, and there will be some points assigned for class participation. Ideally, criticisms should be constructive in nature, including the identification of alleviating solutions. Once a paper has been discussed in class you will be expected to compile an annotated bibliography covering all the papers discussed during the quarter and submit this to me by the end of the quarter. The best time to compile this is to do it as soon as possible after the discussion in class. That is when you will have all the points covered in class. Presentation order for the first few weeks is now available . I have specifically picked on some old students for the first few presentations so that the students who are new to this form of course can get an idea of what to expect. Feedback forms can be downloaded here . A sample critique (very extensive – I will not hold you to such a high standard) is available here. Each of you will be expected to focus on a research-oriented project. The research component is stressed as is evidenced by the fact that many of the projects started by students taking this course over the year have resulted in publications in prestigious conferences and workshops. The projects you do may be in groups of two or individual in nature (if group the tasks will be non-overlapping). A list of project topics will be discussed individually with each of you based on your interests during the first week of class along with relevant references. I will try to meet with each one of you during the first two weeks to help determine projects. The project is expected to culminate in a presentation during the last week of class, and also a report on the experimental results obtained.
The final grade will be determined as follows:
25% Class Participation and Presentations
25% Annotated Bibliography
Last Updated: Jan 2017