- An
on-line seller of computers wishes to maintain data about orders. Customers can order their PC with any
of several processors, a selected amount of main memory, any of several
disk units, and several CD or DVD readers. The fact table for such a database might be:
Orders(cust, date, proc, memory, disk, cd, quant, price)
where
cust is the customer ID and a foreign
key for a dimension table about customers, and proc (processor), disk and cd are similar. For example, a disk ID might be elaborated
in a dimension table giving the manufacturer and several disk
characteristics. The memory attribute is
simply an integer: the number of megabytes of memory ordered. The quant attribute is the number of machines of
this type ordered by this customer, and price is the total cost of each machine
ordered.
- Which
are the dimension attributes, and which are the dependent attributes?
- For
some of the dimensions attributes, a dimension table is likely to be
needed. Suggest appropriate
schemas for these dimension tables.
- Suppose that we want to examine the data above to
find trends and thus to predict which components the company should order
more of. Describe a series of
drill-down and roll-up queries that would lead to the conclusion that
customers are beginning to prefer a DVD drive to a CD drive.
- To
apply the CUBE operator to the example above, we might find it convenient
to break several dimensions more finely.
For example, instead of one processor dimension, we might have one
dimension for the type (e.g., AMD K-6 or Pentium-III), and another
dimension for the speed. Suggest a
set of dimensions and dependent attributes that will allow us to obtain
answers to a variety of useful aggregation queries. In particular, what role does the
customer play? Also the price
above referred to the price of one machine, while several identical
machines could be ordered in a single tuple. What should the dependent attribute(s) be?
- What
tuples of the cube above would you use to answer the following queries?
- Find,
for each processor speed, the total number of computers ordered in each
month of the year 2000.
- List
for each type of hard disk (e.g., SCSI or IDE) and each processor type
the number of computers ordered.
- Find
the average price of computers with 400 megahertz processors for each
month from January 1999.
- The
computers described above do not include monitors. What dimensions would you suggest to
represent monitors? You may assume
the price of the monitor is included in the price of the computer.
- The
design above is suitable for the CUBE operator. However, some of the dimensions could also be given a
nontrivial lattice structure. In
particular, the processor type could also be organized by manufacturer
(e.g., Sun Intel, AMD, Motorola), series (e.g., Sun UltraSparc, Intel
Pentium or Celeron, AMD K-series, or Motorola G-series), and model (e.g.,
Pentium-III or AMD K-6).
- Design
the lattice of processor types following the examples described above.
- Define
a view that groups processors by series, hard disks by type, and CD’s by
speed, aggregating everything else.
- Define
a view that groups processors by manufacturer, hard disks by speed, and
aggregates everything else except memory size.
- Give
examples of queries that can be answered from the view of (b) only, the
view of (c) only, both, and neither.
(*) Adapted from Garcia-Molina,
Ullman & Widom, Database System Implementation, Prentice-Hall, 2000,
pp. 612-632.