BMI7830 - Advanced Bioinformatics for Human Diseases
CSE5559 Special Topics in Data Analysis and Visualization

With the fast development of high throughput technologies such as microarray and next generation sequencing (NGS), bioinformatics becomes an essential part of biomedical research on human diseases. Analysis of the large amount of high throughput data becomes the new bottleneck in many research projects. The goal of this course is to let students get familiar with the commonly used bioinformatics data analysis tools via hands-on training and discussion on both classical and state-of-the-art literature. The topics include analysis and visualization of both microarray and NGS data for genotyping, and epigenomics, and transcriptome studies in human diseases as well as advanced methods based on gene network inference and analysis.


Course Contents: R, Bioconductor, Microarrays, RNA-sequence data, bioinformatics, genotyping, epigenomics, transcriptome, visualization, co-expression network

The Highlights:  R, Bioconductor, machine learning, visualization, transcriptomics, micorarrays, RNA-seq

Text Books and

  1. The Text
    1. Statistics and Data Analysis for Microarrays Using R and Bioconductor, Sorin Draghici, 2nd Edition, Chapman & Hall, 2011.
  2. Very Useful Tomes
    1. An Introduction to Bioinformatics Algorithms, Neil C. Jones and Pavel Pevzner, MIT Press, 2004.
    2. Bioinformatics, David W. Mount, Cold Spring Harbor Press, Second Edition, 2013.
    3. Bioinformatics and Functional Genomics, Jonathan Pevzner, Wiley-Blackwell, 2nd Edition, 2009.
  1. R
    1. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Robert Gentleman, Vincent Carey, Wolfgang Huber and Rafael Irizarry, 2005.
    2. R Cookbook, Paul Teetor, O'Reilly, 2011.
  1. Molecular Biology and Narratives/Editorials
    1. Molecular Biology of the Cell, Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter, 4th Edition, (
    2. Genomic Imperfections, Ramesh Hariharan, Strand Life Sciences (draft version).
    3. PLOS Computational Biology: Translational Bioinformatics.
    4. The Processes of Life, Lawrence Hunter, MIT Press, 2009.


  1. Kun Huang, Biomedical Informatics, The Ohio State University
  2. Raghu Machiraju, Department of Computer Science and Engineering, The Ohio State University

Grading Assistant: Instructors

Time: TR 1:00-1:55 PM; Lincoln Tower 240

Office Hours: Kun Huang: By Appointment. Please contact at kun.huang dot osumc dot edu

                         Raghu Machiraju: DL779: M 2:00-3:00 PM, W:3:00-4:00PM; LT Third Floor T, R: 2:00-3:00 PM (after class). Also contact him at machiraju dot 1 at osu dot edu.

Grade Distribution: Assignments: 40%, Quizzes:20%, Final Project: 40%

Class Help/Watering Hole: Piazza -

The Schedule & Lectures

Chapters below allude to text by Sorin Drahici !

Week 1

8/25: Overview of high throughput technologies, online resources, and public data repositories (pdf)

8/27: 8/27 Review of biology / basic bioinformatics techniques (pdf)

Chapter 2 from text; Biology Primer(Lander):

Week 2

9/1: - Use of high throughput gene expression data in biomedical research (pdf)

9/3: Laboratory techniques for measuring gene expression- Prof. Jeff Parvin, Biomedical Informatics  (pdf)

Chapter 3, Papers discussed in slide decks.

Week 3

9/8: Introduction to R and Bioconductor I  (pdf)

9/10: R and Bioconductor II (pdf)

Computing Basics for Bioinformatics - Self-paced Learning, Chapters 3,4/6,7,

Week 4

9/15: Normalization of microarray data (pdf)

9/17: Genetics and Translational Research - Prof. Chris Bartlett, Nationwide Children  (pdf)

Chapter 20

Week 5

9/22: Normalization (pdf)

9/24: Comparative Analysis: HT  - (pdf)

Chapter 11, Chapter 12, Chapter 16, Chapter 20

Week 6

9/29:  Comparative Analysis - multiple test comparison (pdf)

10/1: Unsupervised learning in bioinformatics (pdf)

Chapter 16, Chapter 20, Chapter 18

Week 7

10/6:  Supervised learning in bioinformatics (pdf)

10/8: Visualization (pdf - Original),
Correlation Analysis. Slides presented by Prof. Huang (pdf).

Chapter 18, Chapter 29, Chaper 17

Week 8

10/13: Gene network analysis (pdf)

10/15: Holiday

Lab2 Announced on 10/13.

Week 9

10/20: Gene Ontology and Pathway Analysis (pdf) - Guest Lecture,  Dr. Jianying Zhang, OSUMC

10/22: Small Sample Size Analyis (pdf) - LIMMA, Sample Size Estimation - Linbao Yu

Lab2 Due - 10/22

Week 10

10/27:  Data Foreniscs - Prof. Kevin Coombes

10/29: Introduction to NGS (pdf)- Gulcin Ozer

Project Proposal Due - 10/29.

Week 11

11/3: Sequence alignment of NGS (pdf) -  Selen  Yilmaz, Gulcin Ozer

11/5: Sequence alignment for RNA-seq data (pdf)

Lab3 announced on 11/3.

Week 12

11/10: Sequence alignment for RNA-seq data (pdf)

11/19: Comparative analysis of RNA-seq data (pdf)

Lab3 due on 11/13.

Week 13

11/17: Review/De novo analysis of RNA-seq data (pdf)

11/19: Project Presentations

Dwell on projects.
Lab 4 is announced on 11/19

Week 14

11/24: TCGA (pdf), Quiz 3

11/26 Thanksgiving Holiday - have a good one.

Quiz 3 on 11/24.
And dwell on turkeys ...

Week 15

12/1: Networking and Visualization (pdf)

12/3: Networking and Visualization (pdf)

Lab 4 Due on 12/5

Week 16

12/8: Networking and Visualization (pdf), Quiz 4

 --- DONE ---

Quiz 4 on 12/8

The Final Presentation: Friday Dec 11 2:00pm-4:00 pm
Quizz: 9/10, 10/20, 11/19, 12/4

Final Project Proposal - Integrative Fishing in TCGA waters
Due: October 29, 2015

Do the following:

  1. Exception: If you have your own ideas then you need to talk to us soon. Still follow steps 2, 6, 7 as smuch as possible.
  2. Form teams of 2 peers in class.
  3. If you pick leukemia, breast, ovarian, and lung cancers, there is local help available.
  4. Your project will be on integrative geomics.
  5. You will be looking at using trasncriptomics data in addtion to other data (clinical data for sure) and other data.
  6. Pen a proposal that needs to be submitted to the instructors by October 29, 2015.
  7. Please use Carmen to submit. Dropboxes will be setup.
  8. The proposal needs to have the following:
    1. Team members
    2. Cancer - pick one for the project.
    3. Pick datasets for integrative fishing
      1. Transcriptomics - RNA-seq, microarray
      2. Clinical outcome - A subset tumor stage, histology, ER/PR/HER2 status
      3. Any other data - Copy number variation (CNV)
    4. Salient papers - list them
      1. Russ Altman: His review of papers at AMIA TBI conferences can help you select projects
    1. Salient Methods - you can discuss with us
    2. Intended Goals -
      1. Subtyping/Clustering - dividing patients
      2. Biomarkers - picking genes which will discriminate normal from cancer
      3. Classification (Semi-superised) Learning
      4. Network Characterization, Gene Co-expression Networks
    1. Format of final report Please do complete a report with the following sections like a regular paper
      1. Title
      2. Author List
      3. Abstract - a few sentences for motivation, a few for stating significance of proposed work, a few on cancer or disease or experimental conditions, a few on methods, a few on results. Few should be either one or two sentences.
      4. Introduction - Motivate the problem, state significance of why you chose to the proposed integration, describe the data shortly, and highlight results. Then provide a roadmap.
      5. Data - describe the data 
      6. Methods - describe the methods/workflows used, explain why chose this method. You can use sub-sections if needed.
      7. Results - make your point with tables, figures, etc.
      8. Discussions, Summary and Conclusions
      9. Citations


  1. Lab I - Announced: 9/17, Due: 10/8 - Preprocessing/Exploratory Visualization of MicroArray Data for Lukemia/Lung Cancer
  2. Lab II - Announced: 10/13, Due: 10/29 - Analysis and Visualization of MicroArray with GenePattern for LAB 1 data
  3. Lab III - Announced: 11/3, Due: 11/13 -  Galaxy Workflows and Variant Calling
  4. Lab IV - Announced: 11/20, Due: 12/5 -   Differential analysis with RNA-seq data

Useful Links
Russ Altman: His review of papers at AMIA TBI conferences can help you select projects.

 The venerable p-value -
It is ubiquitously used by the bioinformatics community to determine the viability of their hypotheses results. However, it is running into a maelstrom of criticism as noted in the following commentaries found in the first two links. The other two in the list below are more informed manuscripts that dwell on this controversy.


Lior Pachter Blog:  Do read his musings on Network Nonsense
Russ Altman: His review of papers at AMIA TBI conferences can help you select projects


  1. Biology primer/review by Eric Lander
  2.  Gene expression microarray (and other high throughput technology)
  1. Next generation sequencing (NGS)
  1. MicroArray Analysis in BioConductor/R
  1. Using NCBI Gene Expression Omnibus (GEO)
    3. To use GEO data in R, use GEOquery package
  1. Other important bioinformatics tools
    1. UCSC Genome Browser (
    2. Annotations can also be downloaded at
    3. Pathway and network databases: collects more than 500 such databases.
    4. Network visualization: CytoScape
    5. Cytoscape -
    6. Gene Enrichment -
      2. DAVID
      3. BINGO (a CytoScape plugin)
      4. GSEA
  1. Gene co-expression network analysis (Steve Hovrath)