COMPH Database Analysis


Allan has asked that the data that was used to populate 22 tables in a recent training grant proposal be uploaded into the IGBP database. This proposal was submitted by the College of Medicine and Public Health (COMPH) to NIH. Upon examination of these tables, it appears that a new database will be necessary to accommodate all the fields in the tables. A new Switchboard and all forms will also have to be redesigned and implemented to accommodate data input.

Project Effort

One-time Resources

We estimate that it will take approximately one-year of a 50% GRA's time to get the COMPH IS up and running including all design, implementation, testing, data-quality efforts (validation and error handling), and meeting times.

During the course of these efforts, we anticipate several meetings to 1) clarify certain fields and concepts from the 22 tables, 2) demo prototype systems to the potential users, and 3) train those who will be doing the input and running the queries. We will also document all design and implementation activities in an ongoing manner.

To accommodate the new data, a database will be required that is approximately twice the size of the current IGBP IS. Even with the incomplete analysis below, we see that the number of tables will approximately double from about 30 to over 50. Many new fields will have to be added while other fields need to be redefined. (Time estimate: one quarter)

The new Switchboard and input forms must be designed and implemented. (Time estimate: one quarter)

Data will have to be loaded into the new COMPH database from the current IGBP database. Also, the current COMPH data stored in a Word document will have to loaded. (Time estimate: one quarter)

At least 22 queries/reports will then have to be written, one for each of the tables, some of them very complex. (Time estimate: one quarter)

Ongoing Resource Requirements

Input personel must be identified for each degree granting unit, Center and Program.

Finally, the maintenance of this IS will be an ongoing effort and someone who is fluent in Access must be hired for this purpose.

Analysis of Tables

Below we present an analysis of the new tables and fields that will be needed to accommodate data in each of the tables. We also list issues that must be resolved before design decisions can be completed.

November 8, 2000: All updates have been added to ER diagrams and documented online by Mamrak

November 14, 2000: All outstanding issues listed below were resolved in a meeting today with Allan, Angie and Christine. The resolution is noted after the respective issue.

Outstanding Issues

Issues Relating to NIH Requirements for Tables

Mamrak has looked at the official NIH requirements for the various tables and several more issues have arisen. These are listed below. My approach is to consider all the text and to explicitly account for the table descriptions in order, starting at '9. Research Training Program Plan' on p. V-3 of the 'Application for a Public Health Service Grant PHS 398.' I identify the MD/PhD Program table corresponding to the NIH table and the outstanding issues.

I am able to account for only Tables 1, 2, 3, 8, 9, 10, 13, 21, and 22 based on the requirements I read in the 'Application. . .'. Tables 11, 14 and 20 are questionable. Of the tables I could match to the requirements, some have missing data fields.

As of November 15, all outstanding issues are resolved. The new design of the database accommodates all required and all extraneous tables.

In a discussion with Allan we decided which of the extraneous tables should be retained. We decided to keep LaboratoryRotations and ResidencyMatch and to add fields Academic Quarter and AcademicCreditHours to the former. We decided to discard USMLES, HealthRelatedCenters, HRCFunding, ClinicalPrograms, CPPIs, CPFunding, USMLEStats, and RelatedWebSites.

We are now ready to move onto the db and switchboard implementation phases

Issues Relating to Data Upload from Word Tables

After the implementation of the extended database and switchboard is complete, we will upload the current IGBP data into the new database. Then we will try to automate the upload data into the database from the Word tables, to the extent possible.

Below we analyze the upload task.

In summary, we can now target for full or partial upload data from Tables 3, 8, 9, 10, 11, and 18. We now need to look at the Word files that contain these data to explore the automation issue more fully.

Details of the Data Upload


Sandy Mamrak
Last modified: June 5, 2000