CIS 671, Winter 2002
Homework 4: OLAP*

 

  1. An on-line seller of computers wishes to maintain data about orders.  Customers can order their PC with any of several processors, a selected amount of main memory, any of several disk units, and several CD or DVD readers.  The fact table for such a database might be:
  2. Orders(cust, date, proc, memory, disk, cd, quant, price)

    where cust is the customer ID and a foreign key for a dimension table about customers, and proc (processor), disk and cd are similar.  For example, a disk ID might be elaborated in a dimension table giving the manufacturer and several disk characteristics.  The memory attribute is simply an integer: the number of megabytes of memory ordered.  The quant attribute is the number of machines of this type ordered by this customer, and price is the total cost of each machine ordered. 

    1. Which are the dimension attributes, and which are the dependent attributes?
    2. For some of the dimensions attributes, a dimension table is likely to be needed.  Suggest appropriate schemas for these dimension tables.
  3. Suppose that we want to examine the data above to find trends and thus to predict which components the company should order more of.  Describe a series of drill-down and roll-up queries that would lead to the conclusion that customers are beginning to prefer a DVD drive to a CD drive.
  4. To apply the CUBE operator to the example above, we might find it convenient to break several dimensions more finely.  For example, instead of one processor dimension, we might have one dimension for the type (e.g., AMD K-6 or Pentium-III), and another dimension for the speed.  Suggest a set of dimensions and dependent attributes that will allow us to obtain answers to a variety of useful aggregation queries.  In particular, what role does the customer play?  Also the price above referred to the price of one machine, while several identical machines could be ordered in a single tuple.  What should the dependent attribute(s) be?
  5. What tuples of the cube above would you use to answer the following queries?
    1. Find, for each processor speed, the total number of computers ordered in each month of the year 2000.
    2. List for each type of hard disk (e.g., SCSI or IDE) and each processor type the number of computers ordered.
    3. Find the average price of computers with 400 megahertz processors for each month from January 1999.

  6. The computers described above do not include monitors.  What dimensions would you suggest to represent monitors?  You may assume the price of the monitor is included in the price of the computer.
  7. The design above is suitable for the CUBE operator.  However, some of the dimensions could also be given a nontrivial lattice structure.  In particular, the processor type could also be organized by manufacturer (e.g., Sun Intel, AMD, Motorola), series (e.g., Sun UltraSparc, Intel Pentium or Celeron, AMD K-series, or Motorola G-series), and model (e.g., Pentium-III or AMD K-6).
    1. Design the lattice of processor types following the examples described above.
    2. Define a view that groups processors by series, hard disks by type, and CD’s by speed, aggregating everything else.
    3. Define a view that groups processors by manufacturer, hard disks by speed, and aggregates everything else except memory size.
    4. Give examples of queries that can be answered from the view of (b) only, the view of (c) only, both, and neither.

 

 

(*) Adapted from Garcia-Molina, Ullman & Widom, Database System Implementation, Prentice-Hall, 2000, pp. 612-632.