COMPH Database Analysis
Allan has asked that the data that was used to populate 22 tables
in a recent training grant proposal be uploaded into the IGBP
database. This proposal was submitted by the College of Medicine and
Public Health (COMPH) to NIH. Upon examination of these tables, it
appears that a new database will be necessary to accommodate all the
fields in the tables. A new Switchboard and all forms will also have
to be redesigned and implemented to accommodate data input.
Project Effort
One-time Resources
We estimate that it will take approximately one-year of a 50% GRA's
time to get the COMPH IS up and running including all design,
implementation, testing, data-quality efforts (validation and error
handling), and meeting times.
During the course of these efforts, we anticipate several meetings
to 1) clarify certain fields and concepts from the 22 tables, 2) demo
prototype systems to the potential users, and 3) train those who will
be doing the input and running the queries. We will also document all
design and implementation activities in an ongoing manner.
To accommodate the new data, a database will be required that is
approximately twice the size of the current IGBP IS. Even with the
incomplete analysis below, we see that the number of tables will
approximately double from about 30 to over 50. Many new fields will
have to be added while other fields need to be redefined. (Time
estimate: one quarter)
The new Switchboard and input forms must be designed and
implemented. (Time estimate: one quarter)
Data will have to be loaded into the new COMPH database from the
current IGBP database. Also, the current COMPH data stored in a Word
document will have to loaded. (Time estimate: one quarter)
At least 22 queries/reports will then have to be written, one for
each of the tables, some of them very complex. (Time estimate: one
quarter)
Ongoing Resource Requirements
Input personel must be identified for each degree granting unit,
Center and Program.
Finally, the maintenance of this IS will be an ongoing effort and
someone who is fluent in Access must be hired for this purpose.
Analysis of Tables
Below we present an analysis of the new tables and fields that will
be needed to accommodate data in each of the tables. We also list
issues that must be resolved before design decisions can be completed.
- Overall, change all fields named 'Department' to degree granting
unit id, DGUnitID.
- Table 1. Add a DateHired field (data type: date) to the Faculty
table.
- Tables 2 and 3. Add a lookup table for degree granting programs,
DGPrograms, with fields DGProgramID and DGProgramName with initial
values Interdisciplinary, Medical School, Biological Sciences, Vet
School and Pharmacy.
Add a lookup table for degree granting units, DGUnits, with fields
DGUnitID, DGUnitName, FacultyToBeRecruited (data type: number), and
DGProgramID, with initial values from Table 2 tied to the correct
DGProgramID's.
Add a lookup table to describe training grants, TrainingGrants,
with fields TGID, Title, Director, GrantNumber, StartDate, EndDate.
Add tables PostDocTG, DirectorTG, FacultyTG, StudentTG, and to
record participation in various training grants with fields
PostDocPersonID, TGID, RoleID, PercentEffort, and similarly for
faculty and students. (the last two fields are for table 8)
Change PostDoc table to have fields PersonID, DGUnitID, Degree
(with values PhD, MD, DSc, PsyD, DVM),
Institution, DegreeYear, StartDate, EndDate. Relate PostDoc table to
People (as well as Students).
Add a table FacultyAdvisor with PostDocPersonID, FacultyPersonID,
StartDate and EndDate.
- Table 4. Add lookup table FundingAgencies with FAID and FAName,
with initial values, NIH, NSF, DoD, DoE, Industry, Foundations, Health
Associations, State Agencies, Royalties, NCI, NCRR, NEI, NIAID.
Add table HealthRelatedCenters with fields DirectorID, FAID,
CenterName, ScientificEmpasis (text field) and MajorFacilities (text
field).
- Table 5. Add table ClinicalPrograms with fields
PrincipalInvestigator (FacultyPersonID), Title, FAID.
Add new lookup table Role with RoleID and RoleName. Initial values
are Core Faculty, Co-Director, Operating Committee and all
roles from Table 7.
- Table 8. Add a lookup table for both Faculty and Students,
DegreesReceived, with fields PersonId, DegreeId, DegreeYear.
Add a lookup table Degrees with DegreeId, DegreeDescription.
- Table 9. Add Grants table for Faculty. Add field GrantTitle to
the table.
- Table 10. In PreAdmission table add fields
UndergradInstitution, Degree (with values BS, BA, BS/BA, MBBS, MS),
DegreeYear.
- Table 11. Nothing new required.
- Table 13. Add MCAT table with fields Verbal, Physiology (?),
Biology (?) and WR (?).
Add Honors, TeachingExper, PriorResearchExper,
ResearchInterests fields (all text) to Students table.
Add OtherMSTApplications table with fields StudentPersonID,
Institution (text), Outcome (text).
Add LaboratoryRotations table with fields StudentPersonID,
FacultyPersonID, OtherInstitution, OtherFaculty (text).
- Table 14. Add ResearchInterests field to Student table.
- Table 15. Nothing new required.
- Table 16. Add field External Fellowship (y/n) to
Support table. If yes, the Comment field can contain the relevant info.
- Table 17. Add USMLEScore and Step1Date to Students table.
Add table USMLE with values Step1Date, MinimumPass, OSUMean,
NationalMean. (might be able to calculate OSUMean)
- Table 18. Add table ResidencyMatch with fields StudentPersonID,
StartDate, EndDate, SpecialtyArea, Program Location.
- Table 19. Nothing new required.
- Table 20. For this table, you need to choose a representative
program. For the IGBP we have all this data (except there are some
strange data inthe GRE Scores column).
- Table 21. Need Ethnicity lookup table. Add Ethinicity field to
Student table.
November 8, 2000: All updates have
been added to ER diagrams and documented online by Mamrak
November 14, 2000: All outstanding issues listed below were
resolved in a meeting today with Allan, Angie and Christine. The
resolution is noted after the respective issue.
Outstanding Issues
- In general: For uploading existing data from Word format,
the issue of publication fields is important. On what fields do we
anticipate searching? We decided to keep all the fields in the
current publications table. We'll do the upload manually if need be.
- Table 6. How do these Departments/Programs relate to those
in Table 2. Do we just need a new table with web sites of
interest? They do not relate. We will create a new table to hold
web sites of interest. Done 11/15/2000.
- Table 7. What is an MD/PhD Operating Committee? Are
primary and secondary appointments to DGUnits? What does Role mean in
this table? What does %Effort mean? The operating committee plays
a role equivalent to Graduate Studies committees in
departments. Primary and possibly multiple secondary appointments are
to DGUnits. Role is the role on the operating committee and percent
effort is percent of a 40 hour work week spent on work for this
committee. Added a WeeklyPercentEffort to Committees table,
11/15/2000.
We also decided that the 'Program' concept as newly implemented
should change to 'College.' So, colleges have DGUnits instead of
programs. Programs can be degree granting. Done 11/15/2000.
- Table 8. Does the 'Primary (Graduate & Secondary)
Appointment(s)' field here have identical semantics to the similar
table in Table 7? Yes. No change required.
- Table 9. Should we assume these are all grants other than
training grants? Yes. No change required.
- Table 12. If the Seminar Series presentations are the same
as those identified in the MiniSeries table, then nothing new is
required. Create a new llokup table for for SeminarTypes with
fields SeminarTypeID and SeminarDescription. Add MiniPresentations,
MD/PhD Program Seminar and Seminar Series IGBP 797. Delete
MiniPresentations table and replace with new SeminarPresentations
table with fields FacultyID, SeminarTypeID, Title, Date. Done
11/15/2000.
- Table 13. What notation is being used for the GRE column?
Is the Preceptor the Advisor? Is the Graduate Program the degree
granting unit? What are the areas of the MCAT exam? Only MCAT
scores are reported. The sections of the MCAT and order of reporting
are Verbal (type number), Physical Sciences (type number), Writing
(type text), and Biological Sciences (type number). The perceptor is
the advisor. The graduate program is the degree granting unit for the
student. Clean up MCAT table. Done 11/15/2000.
- Table 14. Can the 'Year in Program' value be derived?
No. Create a new field YearInProgram in the Student table. Done
11/15/2000.
For 'Couses Taken' we store only the core courses in the IBGP
database. Do we want to store all courses? No.
- Table 17. Again, can the first column be derived? No.
- Table 21. What values should we use to populate the
Ethnicity lookup table? Can we determine if a student is a URM from
this table? Create Ethnicity table with values 1: Black,
non-Hispanic; 2: American Indian or Alaskan native; 3: Asian or Pacific
Islander; 4: Hispanic; and 5: White, non-Hispanic. A URM is any
student in 1, 2, or 4. Done 11/15/2000.
- Table 22. Need to resolve format of GRE Scores column.
See above. Done 11/15/2000.
- NIH bio's. Need to resolve inclusion of them in the db or
not. We decided not to generate the bio's from out database for a
variety of reasons.
Issues Relating to NIH Requirements for Tables
Mamrak has looked at the official NIH requirements for the various
tables and several more issues have arisen. These are listed below.
My approach is to consider all the text and to explicitly account for
the table descriptions in order, starting at '9. Research
Training Program Plan' on p. V-3 of the 'Application for a Public
Health Service Grant PHS 398.' I identify the MD/PhD Program table
corresponding to the NIH table and the outstanding issues.
I am able to account for only Tables 1, 2, 3, 8, 9, 10, 13, 21, and
22 based on the requirements I read in the 'Application. . .'. Tables
11, 14 and 20 are questionable. Of the tables I could match to the
requirements, some have missing data fields.
- under 'a. Background' 'Give the current number of faculty
members in each unit and department...' Table 2. ok.
- under 'a. Background' 'List current and pending training
support. . .' Table 3. Does not include amount of award or percent
effort in the grant. Add AwardAmount field to TrainGrants table.
PercentEffort is already in the FacultyTG and DirectorTG tables. Done
11/15/2000.
- under 'b.(2) Program Faculty' 'List each training faculty
member. . .' Table 8. ok.
- under 'b.(2) Program Faculty' 'Descibe each faculty member's
research . . .and indicate how trainees wil participate in the
research.' Table 11? Does not include trainee particpation component.
- under 'b.(2) Program Faculty' 'Describe each faculty member's
research. . .' Table 9. ok.
- under 'b.(2) Program Faculty' '. . .for each faculty member
list all current and past students. . .' Table 10. Does not include
title of the research project. Add FacultyID to StudentTG table.
Title is already included in TrainingGrants table. Done 11/15/2000.
- under 'b.(2) Program Faculty' 'For new applications, list
representative recent publications. . .' Table. 15. ok.
- under 'b.(4) Trainee Candidates' 'Describe recruitment
plans. . . Table 1. ok.
- under 'b.(4) Trainee Candidates' 'Give the qualifications of
prospective trainees. . .' Table 14?
- under 'b.(4) Trainee Candidates' 'The size of the applicant and
trainee pool . . .' can't find an existing table that provides data on
size of the applicant and trainee pool. Create new field in
PreAdmissions table called AdmissionTypeID. Create new AdmissionTypes
lookup table with values Graduate School, Trainee: Predoctoral,
Trainee: Postdoctoral, Fellow: Postdoctoral. Done 11/15/2000.
- under 'b.(4) Trainee Candidates' 'The qualification of
prospective predoctoral. . .' Table 22. Does not include whether
applicants were U.S. citizens or had permanent resident status. Table
20 is also included to satisfy this requirement, giving data for
current students, but such data is not required by NIH. We have
U.S. citizen/permanent resident fields in the PreAdmissions table.
- under 'b.(4) Trainee Candidates' 'The qualification of
prospective postdoctoral trainees. . .' Table 13. Does not include
thesis research topic, whether applicants were or were not offered
admission, which applicants entered the program and whether applicants
were U.S. citizens or had permanent resident status. Added ThesisTile
field to PostDocAdvisor table. All other data is available in the
PreAdmissions table.
- under 'c.' '. . .information relate to students from racial and
ethnic groups' Table 21. ok.
As of November 15, all outstanding issues are resolved. The new
design of the database accommodates all required and all extraneous
tables.
In a discussion with Allan we decided which of the extraneous
tables should be retained. We decided to keep LaboratoryRotations and
ResidencyMatch and to add fields Academic Quarter and
AcademicCreditHours to the former. We decided to discard USMLES,
HealthRelatedCenters, HRCFunding, ClinicalPrograms, CPPIs, CPFunding,
USMLEStats, and RelatedWebSites.
We are now ready to move onto the db and switchboard implementation
phases
Issues Relating to Data Upload from Word Tables
After the implementation of the extended database and switchboard
is complete, we will upload the current IGBP data into the new
database. Then we will try to automate the upload data into the
database from the Word tables, to the extent possible.
Below we analyze the upload task.
- Some of the tables are not required for an NIH training grant
proposal and we have decided not to store data from them. These
tables are 4, 5, 6, 7, 14, and 17. We will no longer consider these.
- Some of the tables are not required by for an NIH training
grant proposal, but we do have the data stored. These tables have a
potential for upload. These tables are 12, 16, 18, 19 and 20.
- Table 12. Md/PhD Seminar Series. We can only upload these data
if records already exist for the individual faculty (speakers).
Because these records are not likely to exist and because there are
only 16 entries in the table, we recommend this table for manual
upload. Manual creation of the faculty records with additional data
would be a good start to populating the new database.
- Table 16. Again, these records cannot be uploaded unless the
student records already exist. Because these records are not likely
to exist and because there are only 12 entries in the table, we
recommend this table for manual upload. I suggest that instead of the
'Discipline' data being entered, that the degree granting unit be
entered. This would be more specific data.
- Table 18. Likely we can automate the upload of data in this
table. We can create new student records with last name, first name,
a corresponding record in DegreeGrantingUnits and a corresponding
record in ResidencyMatch.
- Table 19. Because the years of matriculation for these
students range from 1967 to 1995, these 'alums' will definitely not
have student entries in the database. Further, the 'Present Position'
and 'Present Funding' text strings will impossible to parse for
automated upload. We recommend this data for manual upload.
- Table 20. The students to whom this data apply are not
identified by name, so the data is not appropriate for upload.
- The remainder of the tables contain data that are required for
training grants. These tables are 1, 2, 3, 8, 9, 10, 11, 13,
15, 21 and 22.
- Table 1. These data are derived from tables and not stored
directly so it doesn't make any sense to upload it.
- Table 2. These data are derived from tables and not stored
directly so it doesn't make any sense to upload it.
- Table 3. The first four columns in the table can be uploaded
into the TrainingGrants table. Faculty records would have to be
created first.
- Table 8. We can automate the upload of data in this table.
- Table 9. We can automate the upload of data in this table.
- Table 10. We can upload some data from this table: faculty and
student names, student advisor, and prior academic degrees.
- Table 11. We can upload this data if the research interest
text does not exceed the length allowed in our database; most faculty
use at least one half page of text.
- Table 13. The students are not identified, so this data cannot
be uploaded.
- Table 15. Because the publications are text strings and because
we need these strings broken down into their component parts for
upload, we cannot automatically upload the data from this table.
- Table 21. These data are all derived, and thus are not
appropriate for upload.
- Table 22. Students are not identified, so this data cannot be
uploaded.
In summary, we can now target for full or partial upload data from
Tables 3, 8, 9, 10, 11, and 18. We now need to look at the Word files
that contain these data to explore the automation issue more fully.
Details of the Data Upload
- Table 3. Create people, faculty records if none exist; create
records in TrainingGrants table; then populate FacultyTG table.
- Table 8. Create people, faculty records if none exist; add Rank
column to TitleID field in Faculty table and to ResearchInterests
field; create entries in DegreeGraningUnits table.
- Table 9. Create people, faculty records if none exist; add to
Grants table.
- Table 10. Create people, student records if none exist; add to
DegreesReceived table; create people, faculty records if none exist;
add to student AcademicAdvisor table. Ignor 'Pre or Post,' 'Training
Period' and 'Current Position' columns.
- Table 11. Create people, faculty records if none exist; add to
ResearchInterests field only if null (i.e., not filled in from Table
8); truncate when necessary
- Table 18. Create people, student records if none exist; add to
ResidencyMatch table.
Sandy Mamrak
Last modified: June 5, 2000