Course Group Status Report


Database Group

 

Course no. 

Title 

Credit
Hours 

Reqd - Core (R)/
Reqd - Option (O)/
Elective (E) 

Capstone? 

CSE 670 

Introduction to Database Systems I

 

CSE 671 

Introduction to Database Systems II

E; O for Info. Sys.

 

CSE 674

Introduction to Data Mining

3

E

 

CSE 770

Database System Implementation

3

E for undergrads/grads, R for grad major in Systems

 

CSE 772

Information Systems Project

4

O

Yes



1. Summary

Database systems is an important and popular area of Computer Science and plays a crucial role in any CSE program. As databases increasingly integrate various types of new information, the field continues to evolve at a rapid pace to address the new challenges.

 

CSE 670, CSE 671 and CSE 770 provide a three quarter database sequence. CSE 670 is required of all undergraduate majors, BS-CSE, BS-CIS, BA-CIS, BSBA (IS). It is also a popular course for non-CSE graduate students, who have discovered they have database problems in their own research. CSE 671 is required of all information systems option undergraduates, BS-CSE and BS-CIS. CSE 770 is the basic graduate course and is required for a track in the Systems Major PhD area. CSE 674 is a new course on data mining, and CSE 772 is a popular capstone design course. As explained in detail in Section 2.3, courses in this group help us meet a number of ABET objectives.

 

CSE 670 takes the user's view of database systems. As such it emphasizes database design, analysis, and query writing. CSE 671 continues the user's view emphasizing modern database systems and applications, such as object-relational databases, data warehousing including online analytical processing (OLAP) and data mining from the user's perspective, multimedia databases, temporal and active databases. It finishes with the start of database internals, namely disk characteristics, file organizations, and indexing strategies. CSE 770 takes the internals view emphasizing query processing and transaction processing including concurrency control and backup and recovery. CSE 674 expands the discussion on data mining and presents the back-end implementation of current algorithms. In CSE 772, students are required to work in teams and design and implement a large-scale database application.



2. Detailed Analysis

Section 2.1 describes the individual courses in the group. Section 2.2 explains how the group is related to the rest of program. Section 2.3 explains how the group helps meet a range of ABET outcomes. Section 2.4 provides information on the feedback we have received from students, recruiters, etc. about the courses in the group. Section 2.5 summarizes the changes we are considering in the various courses.

2.1 Summary of the courses

CSE 670 Introduction to Database Systems I: This course introduces the basic concepts of database design. The first half of the course focuses on the basic relational data model and its underpinnings. Topics covered include the basic structure of ER-diagrams, relational tables, constraints, the SQL language as well as more abstract (yet core) topics such as relational algebra and calculus etc. The latter half of the course stresses on advanced topics such as functional dependencies, normalization, graphical user interfaces (GUIs) for SQL, embedded SQL, etc. The course involves a number of homework-style assignments as well as lab projects including a major database design project. The course is frequently offered sometimes multiple sections per quarter.

 

One criticism on course content received from students and faculty from other departments has been on the coverage of theoretical topics such as relational algebra and in particular relational calculus. Both of these topics are a little harder to understand but provide a nice formal background to database query design. Moreover both of these topics directly address some of the ABET criteria (especially those relating to mathematical and logical foundations of computer science).

 

Another criticism has been on the current SQL system in use. We anticipate this issue to be resolved after the department moves to a new linux platform this summer.

 

CSE 671 Introduction to Database Systems II: This is the second course in the sequence of database courses. The focus of this course is object relational database systems, data warehousing and analysis and an introduction to data and file storage. We now cover more on index structures and query processing algorithms over these structures. Previously, the first two weeks of the course was focusing on a review of 670 material for those students who have either i) not taken 670, ii) taken 670 long time ago, iii) taken 670 with someone before the course structure was last changed. Now, we reduced that coverage to only first week or even less. We enhanced the course to cover more material on disk storage and I/O systems, file organization, various indexing and data storage mechanisms developed particularly for storing large datasets. Data warehouses, an overview of data mining, and data analysis (OLAP) systems are discussed. Finally more complex systems such as multimedia databases, database triggers are described in the context of monitoring evolving databases. Students seem to enjoy the contents of the course. Several lab projects related to indexing, query processing, data analysis, warehousing are assigned. Such design projects are typically well received.

 

CSE 674 Introduction to Data Mining This is a new course first introduced 4 years ago. Topics covered include the knowledge discovery process, various data preprocessing strategies for aiding and abetting in the analysis of data, key data mining tasks: frequent pattern algorithms, clustering and classification as well as scalable realizations of the same. For each topic examples are drawn from real-world problems from the areas of intrusion detection and bioinformatics. The course is a hands-on, with lots of team project assignments. Feedback from students has been positive thus far.

 

CSE 650 Information Storage and Retrieval: This course has not been offered lately. The course does have an interesting title one which at least a couple of faculty members have an interest in staying on the system. However, there are no immediate plans to offer this course. We decided that for the time being we would not recommend eliminating this course since it is possible that we may want to start offering a revised version of the course sometime in the not too distant future.

 

CSE 770 Database System Implementation: This course covers the internals of database systems. It starts with an in-depth discussion of disk performance and file structures including advanced index structures. These are used in the study of query processing and performance algorithms. Then the problems of crash recovery and concurrency control in database systems are covered. As a 700 level, mainly graduate course, the material is taught at a more rigorous level than is used in 670 and 671. It is also possible to incorporate more research and development projects in this course. We plan to revise 770 to better provide a brief introduction to the areas of database systems research. The main problem with 770 is that, due to a lack of database faculty members, 770 has not been taught since Winter 2001. Given the importance of this course, we suggest that it is offered at least once every other year.


CSE 772 Information System Project: This course teaches information system design and development principles, i.e., requirement analysis, database design methods and tools, process design, application development tools, testing, evaluation and documentation. It is a capstone design course focusing on information systems projects. It is appropriate for undergraduates in the information systems or software systems options in either ENG or ASC. A capstone design course is required for ENG students. Other software systems students may count this course for the ``software lab elective''. Other information systems students may count it as a technical elective. The course is also appropriate for graduate students interested in software engineering and/or database design. There have been several of such students. Besides the technical content, the course also helps students to improve their skills in individual and group time management, project scheduling, professionalism, communication, and teamwork. This course had been traditionally on relational database design. Besides the relational database projects, projects related to modern database applications have also been introduced. The instructor offers a list of well-defined projects, some with outside partners. Some of these projects are typically on relational database design and implementation, others are on more recent areas such as biomedical databases, large-scale image, audio, and genome databases. The new organization of the course is very well received by the students.  By revising or changing the projects every year, the students are kept up-to-date with new developments in information technology, and gain significant experience by applying them to today’s important problems.

 

Recently, in order to improve students' life-long learning skills, a new component was added to the course requiring the students to explore a new tool, technology, or process and write a three or four page paper on it. Further, in order to help us better assess the degree of achievement of a number of program outcomes related to soft-skills such as team-working and communication skills, a number of rubrics have been developed for use in CSE 772 (and in other capstone design courses).

 

Previously 616 was part of the Database Course Group. We now eliminated it from the DB group since it is more of a software engineering course than a database course. The last time a DB faculty taught 616 was in 1998 (by Kerr).

 

2.2 Relation to rest of the program

 

CSE 670: Prerequisites: 314 or 222 or 230 or 502; Math 366

Prerequisite for 616, 772.

 

Some programming background is assumed. However the most important prerequisite is mathematical maturity. Logic, covered in Math 366, is important for the study of query languages, since the problem is to translate an English language query into a formal language equivalent to predicate calculus, e.g., SQL. Mathematical maturity is needed for the material on normalization, which involves a mathematical theory on how to avoid redundancy in a database design.

 

CSE 671: Prerequisites: 670

Prerequisite for 770 for undergraduates.

 

671 is a continuation of 670 so that 670 is a prerequisite. 671 covers file structures, a topic important for 770, and provides additional maturity. Thus it is a prerequisite for 770 for undergraduates.

 

CSE 674: Prerequisites: 670 and 680 or grad standing or permission of instructor.

Not a prerequisite for anything.

 

CSE 770: Prerequisites: 660; 670; 671 or grad standing in CSE.

Not a prerequisite for anything.

 

Since 770 is concerned with the implementation of database systems, it assumes an existing knowledge of the use of database systems as provided by 670. Since 770 deals with database internals and database internals are closely related to operating systems, an operating systems prerequisite is necessary. 660 provides sufficient background. However there has been some overlap between 760 and 770 in the transaction processing area. We plan to reduce the material on transaction processing and focus on newer developments in the database internals. Since 770 is taught at a higher level than 670 and since file structures, important in query optimization, are covered in 671, 671 is a prerequisite for 770 for undergraduates. However, it is assumed that CSE graduate students have the maturity to take 770 without 671. Also, most of the material in 671, i.e., everything except file structures, is not necessary for 770.

 

CSE 772: Prerequisites are 560; 616 or 757; 601; 670.

Not a prerequisite for anything.

 

Since 772 is a capstone design course, students are expected to have 616 or 757, and 560 for necessary background on requirement analysis and software engineering. The projects in 772 are on the field of database systems, therefore 670 is required.

2.3 Relation to BS-CSE Program Outcomes

The courses in the Database group play a crucial role toward achieving certain of the BS-CSE program outcomes. BS-CSE program is expected to demonstrate that the graduates have:

                     a. an ability to apply knowledge of mathematics, science, and engineering 

                     b. an ability to design and conduct experiments, as well as to analyze and interpret data 

                     c. an ability to design a system, component, or process to meet desired needs

                     d. an ability to function on multi-disciplinary teams

                     e. an ability to identify, formulate, and solve engineering problems

                     f. an understanding of professional and ethical responsibility

                     g. an ability to communicate effectively

                     h. the broad education necessary to understand the impact of engineering solutions in a global and societal context

                     i. a recognition of the need for, and an ability to engage in life-long learning

                     j. a knowledge of contemporary issues

                     k. an ability to use the techniques, skills, and modern engineering tools necessary for practice as a CSE professional. 

The table below summarizes the contributions the various courses in the group make toward achieving the various BS-CSE outcomes.

BS-CSE Program Outcomes

CSE Course

a

b

c

d

e

f

g

h

i

j

k

650

XXX

X

XXX

 

XX

X

 

X

X

 

XXX

670

XXX

X

XXX

 

XX

X

 

X

X

X

XXX

671

XXX

X

XXX

XX

XXX

X

X

X

X

X

XXX

674

XXX

X

XXX

XX

XXX

XX

X

X

X

X

XXX

770

XXX

XX

XXX

XX

XXX

XX

XX

X

X

X

XXX

772

XXX

XX

XXX

XXX

XXX

XX

XXX

X

XXX

X

XXX

 

2.4 Feedback

The student evaluations have been generally very positive. Many students have also communicated to faculty members, either written or verbally, how useful some of the courses have been. The demand for the capstone design course 772 has been high, and the student feedback has been positive. Some students summarize their experience as being both exciting and practical.

 

671 is being improved with more material on database performance as well as on newer advanced database applications. We introduced several new hands-on projects that enable better understanding of the internals of database management systems by actually implementing them. In the revised version of 671, we now include more material on database systems performance. During the ABET interviews, one of the visitors pointed this as a great advantage since he feels most of the fresh CS graduates are not knowledgeable on systems performance and scalability issues.

 

For some sections of the course, the students have been provided an auto-grader system where they can test and tune their projects in the stage of development, rather than finding out their errors after submission. This was generally well-received, but also pushed the students to be stricter in certain formats of their inputs and outputs. We have not changed the instructional database system. Most of the projects in DB courses are now done with open source software.

2.5 Possible changes

Major problem The main problem in the database area is the shortage of tenure track faculty, there being only 2 FTEs. With a required undergraduate course and several other DB courses offered, more faculty are needed in this area. With the plan of hiring a new faculty in DB, we hope this will be of a less concern.

 

670: 670 covers standard textbook material and can be taught satisfactorily by many non-tenure-track faculty. Since the course is by nature highly application-oriented, having people with industry background teach the course would make it more useful and appealing. However, to maintain its quality, we should keep some tenure-track faculty involved with the course.

 

671 is much less standard material, covering techniques that are just now coming into commercial use and into coverage in standard database texts. Thus 671 must be taught by very knowledgeable and up to date instructors. 

 

As suggested in our earlier report (2003), we reduced the material on OODBs and expanded the part relating to various indexing and data storage mechanisms for large databases. We have focused more on the basics of such mechanisms, we now plan to cover more on storage and indexing technologies for modern databases. We also find based on feedback from students that this section of the course tends to be more interesting. 

 

In previous years, there have been two weeks of overview of 670 material. We reduced this amount to one week to allocate more time to the core 671 content. This could be cut even more, or eliminated altogether.

 

Another possible change is to cover more on XML and data integration in our DB courses. We may start doing this with 671, but some of this material can also be discussed in 670 and 770.

 

674: We plan to move to a new book – Tan, Steinbach and Kumar. It is a better book than the current one in use.

 

770: The main problem with 770 is that, due to a lack of database faculty members, 770 has not been taught since Winter 2001. As the core graduate database course, it is essential to schedule 770 on a regular basis. Again having a new faculty in DB would help with this problem as well. We suggest that 770 is offered at least once every other year.

 

770 needs some changes to remain up-to-date with the field. It will include more discussion on the new DB applications and the new challenges they introduce, and less on the details of recovery and concurrency control systems.

 

772: Every year we plan to add new project options following the developments in the database technology. Also this year, we are experimenting with a new track on research and development projects. The track involves design and implementation of a proof-of-concept or performance evaluation of various design principles for a timely research problem. We expected this to be appealing and useful for graduate students. We offered around 10 of such projects that involve the development of a tool based on recent techniques in the literature with some novel aspects added. 4 graduate students and 1 undergraduate student chose this option. The research option enriched the course and was useful for not only the students who chose the option, but also for others who were exposed to a variety of research projects. Based on the feedback we get at the end of the quarter, we plan to keep and/or revise this option in following years as well.



3. Conclusions

The Database Group courses play an important role in the CSE programs and help us achieve a number of the published outcomes of the BS-CSE program. The courses, as they stand, are doing well; students are generally satisfied with the courses. 



 

Course

Coordinator

Recent Instructors

CSE 650 

-

Have not been taught recently

CSE 670 

EitanGurari

ChaabouniKrishnasamyGurari

CSE 671 

Hakan Ferhatosmanoglu

Hakan, Srini

CSE 674

Srinivasan Parthasarathy

Srini

CSE 770

Hakan Ferhatosmanoglu

Have not been taught recently

CSE 772

Hakan Ferhatosmanoglu

Hakan

People: Ferhatosmanoglu, Gurari, Parthasarathy.

Date of report: Feb 2007.


Hakan Ferhatosmanoglu

Feb 28, 2007.