COSS_data_vis_R

Data Visualization in Bioinformatics (R) for Comput Ontario Summer School

View the Project on GitHub redgar598/COSS_data_vis_R

Data Visualization in Bioinformatics (R)

Plotting and data visualization are essential for effectively communicating bioinformatics findings, yet they are often treated as trivial tasks. In this course, we will showcase the power of a well-designed plot! We will cover key principles of effective visualization, work through examples ranging from basic to complex, and conclude with a hands-on workshop. By the end of the course, you will be able to create publication- or presentation-ready plots for your own research using R and ggplot2.

Course Outline

RStudio Setup

Using R in the Cloud

https://posit.cloud/

  1. Go to Posit Cloud and create a free account using your Google, GitHub, or email login.
  2. Once you’re logged in, click “New Project”“From Git Repository”.
  3. In the “Git Repository URL” field, paste the following URL: https://github.com/redgar598/COSS_data_vis_R.git
  4. This will create a cloud-based copy of the project, including all the materials you’ll need for the hands-on portion of the workshop.

Using R Locally with RStudio

  1. Open RStudio on your computer.
  2. Go to FileNew ProjectVersion ControlGit.
  3. In the “Repository URL” field, paste: https://github.com/redgar598/COSS_data_vis_R.git
  4. Choose a local folder where you want to save the project, then click Create Project.
  5. RStudio will clone the GitHub repository to your computer, and you’ll be ready to work locally.

Required Packages

install.packages(c("reshape2", "ggplot2", "cowplot", "dplyr", "scales", "RColorBrewer", "gridExtra"))

## optional for some examples plots 
install.packages("ggstream")
install.packages("palmerpenguins")
install.packages("dslabs")

## super optional, for gifs only, takes awhile to install
install.packages("gganimate") 

Data We Will Be Using for ggplot Introduction

We will be looking at gene expression data from mouse photoreceptors. There are samples from different developmental stages (E16,P2,P6,P10 and 4 weeks) and two mouse lines, a wildtype (wt) and knockouts for rod cell specific transcription factor (NrlKO). The gene expression and sample information data were collected from the Gene Expression Omnibus (GEO), under study ID GSE4051.

For more information on the actual paper see the associated publication.

cookbook

www.scientificanimations.com [CC BY-SA 4.0], via Wikimedia Commons


More ggplot examples to work through

Below are several examples of complex plots. Feel free to work through them on your own to see some techniques for developing presentation ready plots.

Differential expression

A common plot used to in computational biology to visualize the differential expression of a gene between conditions.

Vaccination efficacy

This example is take from the simply statistics blog.

Patient mutations

The following example is for patient mutation data in relation of clinical factors. The provided code (taken from stack overflow) generates data to make the plot.

Crops over time

A stream plot from the farming data we were using

Big Foot Sightings

Plot the locations of big foot sightings on a map

Single-Cell UMAP

Uniform Manifold Approximation and Projection of single-cell expression data. Has a fancy black outline around all points!

Gif plots

This is the same penguin data but as a gif over time

Additional Resources

Effective Visual Communication for the Quantitative Scientist
ggplot cheat sheet
Points of View columns on data visualization
From Data to Viz

Additional packages

install.packages("devtools") 
library(devtools)
install.packages("animation")
install.packages("gganimate-0.1.1.tar.gz", repos = NULL, type="source")
install.packages("gapminder")