Sunday, April 18, 2010

GWAS Manhattan Plots and QQ plots using ggplot2 in R

From: Getting Genetics Done Blog

Will posted earlier this week about how to produce manhattan plots of GWAS results using Stata, so I thought I'd share how I do this in R using ggplot2.

First, if you've never used ggplot2, you'll need to add it to your R installation by typing:


install.packages("ggplot2")

Once you've done that, copy and paste this command to download the functions I wrote necessary to produce these plots.  If you'd like to see the source code yourself, copy the URL into your web browser.

source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")

Next, read in the PLINK results file to a data frame. Substitute plink.qassoc with your own results filename.

mydata=read.table("plink.qassoc", header=TRUE)

Finally, run this function which will produce and save in the current directory both a QQ-plot and a manhattan plot of your results:

qqman(mydata)


A few notes:  First, if you're doing this on a linux machine from your Windows computer, you'll need to be running the previously mentioned XMing server on your Windows computer for the plot to save correctly.  Second, the qqman() function calls the manhattan() function, which is extremely slow and memory-intensive. It may take about 3 minutes to run for each dataset. The memory issue isn't a problem on 64-bit machines, but you may run out of memory on 32-bit machines if you're doing this with GWAS data. I'm going to try to improve this in the future. Finally, using that source command you also downloaded a function I wrote called qqmanall(), which does just what it sounds like - if you run it on a linux machine with no arguments it reads in ALL of the plink GWAS results stored in the current directory, and creates QQ and manhattan plots for all of them with a common upper limit for the y-axis corresponding to the most significant result. Enjoy.

...

Update Thursday, January 21, 2010: I neglected to mention yesterday the format of the plink.assoc or plink.qassoc files, in case you want to produce the same plots using results from another software other than plink. When you load your .assoc files in a data frame, the relevant columns are named "CHR", "BP", and "P". You can use this plotting function as long as you have these three columns in your data frame, regardless of whether you use PLINK or not.


2 comments:

  1. HI. I'm a complete noob to R and I was trying your code above. It didn't work. Any help/advice would be appreciated.

    Here is the output from up to date RStudio:


    R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
    Copyright (C) 2013 The R Foundation for Statistical Computing
    Platform: x86_64-apple-darwin10.8.0 (64-bit)
    .
    .
    .
    .
    .


    U9n4/downloaded_packages
    > source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")
    > mydata=read.table("~/gwas_p/h_pthlh/wd/clean_standard/pthlh_qc_hide-covar.assoc.logistic", header=TRUE)
    > qqman(mydata)
    Error: could not find function "qqman"
    > gg.qqman(mydata)
    Error in gg.qqman(mydata) : could not find function "ggqq"
    >

    ReplyDelete
  2. Please continue this great work and I look forward to more of your awesome blog posts. domino online

    ReplyDelete