Tuesday, May 11, 2010

List Serve for Ecology R-Users

For those of you in EEB/Forestry/Ag departments, check out this list-serve: https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Monday, April 26, 2010

How to upgrade R on windows – another strategy (and the R code to do it)

This comes from: http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/
Update: In the end of the post I added simple step by step instruction on how to move to the new system. I STRONGLY suggest using the code only after you read the entire post.

BACKGROUND

If you didn’t hear it by now – R 2.11.0 is out with a bunch of new features.
After Andrew Gelman recently lamented the lack of an easy upgrade process for R, aStackoverflow thread (by JD Long) invited R users to share their strategies for easily upgrading R.

STRATEGY

In that thread, Dirk Eddelbuettel suggested another idea for upgrading R. His idea is of using a folder for R’s packages which is outside the standard directory tree of the installation (a different strategy then the one offered on the R FAQ).
The idea of this upgrading strategy is to save us steps in upgrading. So when you wish to upgrade R, instead of doing the following three steps:
1) download new R and install
2) copy the “library” content from the old R to the new R
3) upgrade all of the packages (in the library folder) to the new version of R.
You could instead just have steps 1 and 3, and skip step 2.
For example, under windows, you might have R installed on:
C:\Program Files\R\R-2.11.0\
But (in this alternative model for upgrading) you will have your packages library on a “global library folder” (global in the sense of independent of a specific R version):
C:\Program Files\R\library
So in order to use this strategy, you will need to do the following steps -
  1. In the OLD R installation (in the first time you move to the new system of managing the upgrade):
    1. Create a new global library folder (if it doesn’t exist)
    2. Copy to the new “global library folder” all of your packages from the old R installation
    3. After you move to this system – the steps 1 and 2 would not need to be repeated. (hence the advantage)
  2. In the NEW R installation:
    1. Create a new global library folder (if it doesn’t exist – in case this is your first R installation)
    2. Premenantly point to the Global library folder whenever R starts
    3. Delete from the “Global library folder” all the packages that already exist in the local library folder of the new R install (no need to have doubles)
    4. Update all packages. (notice that you picked a mirror where the packages are up-to-date, you sometimes need to choose another mirror)
Thanks to help from Dirk, David Winsemius and Uwe Ligges, I was able to write the following R code to perform all the tasks I described :-)
So first you will need to run the following code:

CODE FOR UPGRADING R

Old.R.RunMe <- function (global.library.folder = "C:/Program Files/R/library", quit.R = NULL)
{
# It will:
# 1. Create a new global library folder (if it doesn't exist)
# 2. Copy to the new "global library folder" all of your packages from the old R installation
 
 
 # checking that the global lib folder exists - and if not -> create it.
 if(!file.exists(global.library.folder))
 { # If global lib folder doesn't exist - create it.
  dir.create(global.library.folder)
  print(paste("The path:" , global.library.folder, "Didn't exist - and was now created."))
 } else {
  print(paste("The path:" , global.library.folder, "already exist. (no need to create it)"))
 }
 
 
 print("-----------------------")
 print("I am now copying packages from old library folder to:")
 print(global.library.folder)
 print("-----------------------")
 flush.console()  # refresh the console so that the user will see the massage
 
 # Copy packages from current lib folder to the global lib folder
 list.of.dirs.in.lib <- paste( paste(R.home(), "\\library\\", sep = ""),
       list.files(paste(R.home(), "\\library\\", sep = "")),
       sep = "")
 folders.copied <- file.copy(from = list.of.dirs.in.lib,  # copy folders
        to = global.library.folder,
        overwrite = TRUE,
        recursive =TRUE)  
 
 print("Success.")
 print(paste("We finished copying all of your packages (" , sum(folders.copied), "packages ) to the new library folder at:"))
 print(global.library.folder)
 print("-----------------------")
 
 # To quite R ?
 if(is.null(quit.R))
 {
  print("Can I close R?  y(es)/n(o)  (WARNING: your enviornment will *NOT* be saved)")
  answer <- readLines(n=1)
 } else {
  answer <- quit.R
 }
 if(tolower(answer)[1] == "y") quit(save = "no")
}
 
 
 
 
 
 
 
New.R.RunMe <- function (global.library.folder = "C:/Program Files/R/library", 
       quit.R = F,
       del.packages.that.exist.in.home.lib = T,
       update.all.packages = T)
{
# It will:
# 1. Create a new global library folder (if it doesn't exist)
# 2. Premenantly point to the Global library folder
# 3. Make sure that in the current session - R points to the "Global library folder"
# 4. Delete from the "Global library folder" all the packages that already exist in the local library folder of the new R install
# 5. Update all packages.
 
 
 # checking that the global lib folder exists - and if not -> create it.
 if(!file.exists(global.library.folder))
 { # If global lib folder doesn't exist - create it.
  dir.create(global.library.folder)
  print(paste("The path to the Global library (" , global.library.folder, ") Didn't exist - and was now created."))
 } else {
  print(paste("The path to the Global library (" , global.library.folder, ") already exist. (NO need to create it)"))
 }
 flush.console()  # refresh the console so that the user will see the massage
 
 
 # Based on:
 # help(Startup)
 # checking if "Renviron.site" exists - and if not -> create it.
 Renviron.site.loc <- paste(R.home(), "\\etc\\Renviron.site", sep = "")
 if(!file.exists(Renviron.site.loc))
 { # If "Renviron.site" doesn't exist (which it shouldn't be) - create it and add the global lib line to it.
  cat(paste("R_LIBS=",global.library.folder, sep = "") ,
    file = Renviron.site.loc)
  print(paste("The file:" , Renviron.site.loc, "Didn't exist - we created it and added your 'Global library link' (",global.library.folder,") to it."))
 } else {
  print(paste("The file:" , Renviron.site.loc, "existed!  make sure you add the following line by yourself:"))
  print(paste("R_LIBS=",global.library.folder, sep = "") )
  print(paste("To the file:",Renviron.site.loc))
 }
 
 # Setting the global lib for this session also
 .libPaths(global.library.folder) # This makes sure you don't need to restart R so that the new Global lib settings will take effect in this session also
 # This line could have also been added to:
 # /etc/Rprofile.site
 # and it would do the same thing as adding "Renviron.site" did
 print("Your library paths are: ")
 print(.libPaths()) 
 flush.console()  # refresh the console so that the user will see the massage
 
 
 if(del.packages.that.exist.in.home.lib)
 {
  print("We will now delete package from your Global library folder that already exist in the local-install library folder")
  flush.console()  # refresh the console so that the user will see the massage
  package.to.del.from.global.lib <-   paste( paste(global.library.folder, "/", sep = ""),
             list.files(paste(R.home(), "\\library\\", sep = "")),
             sep = "")   
  number.of.packages.we.will.delete <- sum(list.files(paste(global.library.folder, "/", sep = "")) %in% list.files(paste(R.home(), "\\library\\", sep = "")))
  deleted.packages <- unlink(package.to.del.from.global.lib , recursive = TRUE) # delete all the packages from the "original" library folder (no need for double folders)
  print(paste(number.of.packages.we.will.delete,"Packages where deleted."))
 }
 
 if(update.all.packages)
 {
  # Based on:
  # http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f
  print("We will now update all your packges")
  flush.console()  # refresh the console so that the user will see the massage
 
  update.packages(checkBuilt=TRUE, ask=FALSE)
 }
 
 # To quite R ?
 if(quit.R) quit(save = "no")
}
Then you will want to run, on your old R installation, this:
Old.R.RunMe()
And on your new R installation, this:
New.R.RunMe()

UPDATE – SIMPLE TWO LINE CODE TO RUN WHEN UPGRADING R

(Please do not try the following code before reading this post and understanding what it does)
In order to move your R upgrade to the new (simpler) system, do the following:
1) Download and install the new version of R
2) Open your old R and run –
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
Old.R.RunMe()
(wait until it finishes)
3) Open your new R and run
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()
(wait until it finishes)
Once you do this, then from now on, whenever you will upgrade to a new R, all you will need to do only the following TWO (instead of three) steps:
1) Download and install the new version of R
2) Open your new R and run
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()
(wait until it finishes)
And that’s it.
If you have any more suggestions on how to make this code better – please do share.
(After some measure of review will be given to this code, I would upload it to a file for easy running through “source(…)” )
(Follow the link for other posts by Tal Galili)

Tuesday, April 20, 2010

GGPLOT

Here is a great site that has tutorials for ggplot in R!

http://had.co.nz/ggplot2/

Monday, April 19, 2010

R/Bioconductor SISG Workshop at UW Seattle

UW Seattle is has one of the top biology and statistics programs in the country.  Each year they offer a Summer Institute in Statistical Genetics.  There are scholarships offered for students!  I went before and took the R/Bioconductor workshop as well as the MCMC for Geneticists courses.  Here is the link with the lectures from the Bioconductor workshop taught by T. Lumley and K. Rice.

http://faculty.washington.edu/kenrice/sisg/

Sunday, April 18, 2010

R Analytic Flow

I just discovered this software found @ http://www.ef-prime.com/products/ranalyticflow_en/


 R AnalyticFlow is a software which enables
state-of-the-art data analysis by drawing analysis flowcharts.
You can effectively share processes of data analysis
in collaborative works.
The software is available without charge for any purpose.

This seems like a great program for anyone that has large data sets in R and works with several different collaborators.

GWAS Manhattan Plots and QQ plots using ggplot2 in R

From: Getting Genetics Done Blog

Will posted earlier this week about how to produce manhattan plots of GWAS results using Stata, so I thought I'd share how I do this in R using ggplot2.

First, if you've never used ggplot2, you'll need to add it to your R installation by typing:


install.packages("ggplot2")

Once you've done that, copy and paste this command to download the functions I wrote necessary to produce these plots.  If you'd like to see the source code yourself, copy the URL into your web browser.

source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")

Next, read in the PLINK results file to a data frame. Substitute plink.qassoc with your own results filename.

mydata=read.table("plink.qassoc", header=TRUE)

Finally, run this function which will produce and save in the current directory both a QQ-plot and a manhattan plot of your results:

qqman(mydata)


A few notes:  First, if you're doing this on a linux machine from your Windows computer, you'll need to be running the previously mentioned XMing server on your Windows computer for the plot to save correctly.  Second, the qqman() function calls the manhattan() function, which is extremely slow and memory-intensive. It may take about 3 minutes to run for each dataset. The memory issue isn't a problem on 64-bit machines, but you may run out of memory on 32-bit machines if you're doing this with GWAS data. I'm going to try to improve this in the future. Finally, using that source command you also downloaded a function I wrote called qqmanall(), which does just what it sounds like - if you run it on a linux machine with no arguments it reads in ALL of the plink GWAS results stored in the current directory, and creates QQ and manhattan plots for all of them with a common upper limit for the y-axis corresponding to the most significant result. Enjoy.

...

Update Thursday, January 21, 2010: I neglected to mention yesterday the format of the plink.assoc or plink.qassoc files, in case you want to produce the same plots using results from another software other than plink. When you load your .assoc files in a data frame, the relevant columns are named "CHR", "BP", and "P". You can use this plotting function as long as you have these three columns in your data frame, regardless of whether you use PLINK or not.


Friday, April 16, 2010

Article Attacks R.... R-users fight back instantly!

Here is the link for the article: http://www.thejuliagroup.com/blog/?p=433

If you don't feel like reading it... here is just a quote that sums it up: “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”


There are so many things wrong with this statement, that I immediately had to comment on her blog.  However, I was not the only one.  This article flew through the R-bloggers sphere http://www.r-bloggers.com within hours.  A R-blogger today posted an excellent blog about this event and the R-community.  It's worth a read!
http://www.r-statistics.com/2010/04/an-article-attacking-r-gets-responses-from-the-r-blogosphere-some-reflections/