Monday, April 26, 2010

How to upgrade R on windows – another strategy (and the R code to do it)

This comes from: http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/
Update: In the end of the post I added simple step by step instruction on how to move to the new system. I STRONGLY suggest using the code only after you read the entire post.

BACKGROUND

If you didn’t hear it by now – R 2.11.0 is out with a bunch of new features.
After Andrew Gelman recently lamented the lack of an easy upgrade process for R, aStackoverflow thread (by JD Long) invited R users to share their strategies for easily upgrading R.

STRATEGY

In that thread, Dirk Eddelbuettel suggested another idea for upgrading R. His idea is of using a folder for R’s packages which is outside the standard directory tree of the installation (a different strategy then the one offered on the R FAQ).
The idea of this upgrading strategy is to save us steps in upgrading. So when you wish to upgrade R, instead of doing the following three steps:
1) download new R and install
2) copy the “library” content from the old R to the new R
3) upgrade all of the packages (in the library folder) to the new version of R.
You could instead just have steps 1 and 3, and skip step 2.
For example, under windows, you might have R installed on:
C:\Program Files\R\R-2.11.0\
But (in this alternative model for upgrading) you will have your packages library on a “global library folder” (global in the sense of independent of a specific R version):
C:\Program Files\R\library
So in order to use this strategy, you will need to do the following steps -
  1. In the OLD R installation (in the first time you move to the new system of managing the upgrade):
    1. Create a new global library folder (if it doesn’t exist)
    2. Copy to the new “global library folder” all of your packages from the old R installation
    3. After you move to this system – the steps 1 and 2 would not need to be repeated. (hence the advantage)
  2. In the NEW R installation:
    1. Create a new global library folder (if it doesn’t exist – in case this is your first R installation)
    2. Premenantly point to the Global library folder whenever R starts
    3. Delete from the “Global library folder” all the packages that already exist in the local library folder of the new R install (no need to have doubles)
    4. Update all packages. (notice that you picked a mirror where the packages are up-to-date, you sometimes need to choose another mirror)
Thanks to help from Dirk, David Winsemius and Uwe Ligges, I was able to write the following R code to perform all the tasks I described :-)
So first you will need to run the following code:

CODE FOR UPGRADING R

Old.R.RunMe <- function (global.library.folder = "C:/Program Files/R/library", quit.R = NULL)
{
# It will:
# 1. Create a new global library folder (if it doesn't exist)
# 2. Copy to the new "global library folder" all of your packages from the old R installation
 
 
 # checking that the global lib folder exists - and if not -> create it.
 if(!file.exists(global.library.folder))
 { # If global lib folder doesn't exist - create it.
  dir.create(global.library.folder)
  print(paste("The path:" , global.library.folder, "Didn't exist - and was now created."))
 } else {
  print(paste("The path:" , global.library.folder, "already exist. (no need to create it)"))
 }
 
 
 print("-----------------------")
 print("I am now copying packages from old library folder to:")
 print(global.library.folder)
 print("-----------------------")
 flush.console()  # refresh the console so that the user will see the massage
 
 # Copy packages from current lib folder to the global lib folder
 list.of.dirs.in.lib <- paste( paste(R.home(), "\\library\\", sep = ""),
       list.files(paste(R.home(), "\\library\\", sep = "")),
       sep = "")
 folders.copied <- file.copy(from = list.of.dirs.in.lib,  # copy folders
        to = global.library.folder,
        overwrite = TRUE,
        recursive =TRUE)  
 
 print("Success.")
 print(paste("We finished copying all of your packages (" , sum(folders.copied), "packages ) to the new library folder at:"))
 print(global.library.folder)
 print("-----------------------")
 
 # To quite R ?
 if(is.null(quit.R))
 {
  print("Can I close R?  y(es)/n(o)  (WARNING: your enviornment will *NOT* be saved)")
  answer <- readLines(n=1)
 } else {
  answer <- quit.R
 }
 if(tolower(answer)[1] == "y") quit(save = "no")
}
 
 
 
 
 
 
 
New.R.RunMe <- function (global.library.folder = "C:/Program Files/R/library", 
       quit.R = F,
       del.packages.that.exist.in.home.lib = T,
       update.all.packages = T)
{
# It will:
# 1. Create a new global library folder (if it doesn't exist)
# 2. Premenantly point to the Global library folder
# 3. Make sure that in the current session - R points to the "Global library folder"
# 4. Delete from the "Global library folder" all the packages that already exist in the local library folder of the new R install
# 5. Update all packages.
 
 
 # checking that the global lib folder exists - and if not -> create it.
 if(!file.exists(global.library.folder))
 { # If global lib folder doesn't exist - create it.
  dir.create(global.library.folder)
  print(paste("The path to the Global library (" , global.library.folder, ") Didn't exist - and was now created."))
 } else {
  print(paste("The path to the Global library (" , global.library.folder, ") already exist. (NO need to create it)"))
 }
 flush.console()  # refresh the console so that the user will see the massage
 
 
 # Based on:
 # help(Startup)
 # checking if "Renviron.site" exists - and if not -> create it.
 Renviron.site.loc <- paste(R.home(), "\\etc\\Renviron.site", sep = "")
 if(!file.exists(Renviron.site.loc))
 { # If "Renviron.site" doesn't exist (which it shouldn't be) - create it and add the global lib line to it.
  cat(paste("R_LIBS=",global.library.folder, sep = "") ,
    file = Renviron.site.loc)
  print(paste("The file:" , Renviron.site.loc, "Didn't exist - we created it and added your 'Global library link' (",global.library.folder,") to it."))
 } else {
  print(paste("The file:" , Renviron.site.loc, "existed!  make sure you add the following line by yourself:"))
  print(paste("R_LIBS=",global.library.folder, sep = "") )
  print(paste("To the file:",Renviron.site.loc))
 }
 
 # Setting the global lib for this session also
 .libPaths(global.library.folder) # This makes sure you don't need to restart R so that the new Global lib settings will take effect in this session also
 # This line could have also been added to:
 # /etc/Rprofile.site
 # and it would do the same thing as adding "Renviron.site" did
 print("Your library paths are: ")
 print(.libPaths()) 
 flush.console()  # refresh the console so that the user will see the massage
 
 
 if(del.packages.that.exist.in.home.lib)
 {
  print("We will now delete package from your Global library folder that already exist in the local-install library folder")
  flush.console()  # refresh the console so that the user will see the massage
  package.to.del.from.global.lib <-   paste( paste(global.library.folder, "/", sep = ""),
             list.files(paste(R.home(), "\\library\\", sep = "")),
             sep = "")   
  number.of.packages.we.will.delete <- sum(list.files(paste(global.library.folder, "/", sep = "")) %in% list.files(paste(R.home(), "\\library\\", sep = "")))
  deleted.packages <- unlink(package.to.del.from.global.lib , recursive = TRUE) # delete all the packages from the "original" library folder (no need for double folders)
  print(paste(number.of.packages.we.will.delete,"Packages where deleted."))
 }
 
 if(update.all.packages)
 {
  # Based on:
  # http://cran.r-project.org/bin/windows/base/rw-FAQ.html#What_0027s-the-best-way-to-upgrade_003f
  print("We will now update all your packges")
  flush.console()  # refresh the console so that the user will see the massage
 
  update.packages(checkBuilt=TRUE, ask=FALSE)
 }
 
 # To quite R ?
 if(quit.R) quit(save = "no")
}
Then you will want to run, on your old R installation, this:
Old.R.RunMe()
And on your new R installation, this:
New.R.RunMe()

UPDATE – SIMPLE TWO LINE CODE TO RUN WHEN UPGRADING R

(Please do not try the following code before reading this post and understanding what it does)
In order to move your R upgrade to the new (simpler) system, do the following:
1) Download and install the new version of R
2) Open your old R and run –
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
Old.R.RunMe()
(wait until it finishes)
3) Open your new R and run
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()
(wait until it finishes)
Once you do this, then from now on, whenever you will upgrade to a new R, all you will need to do only the following TWO (instead of three) steps:
1) Download and install the new version of R
2) Open your new R and run
source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()
(wait until it finishes)
And that’s it.
If you have any more suggestions on how to make this code better – please do share.
(After some measure of review will be given to this code, I would upload it to a file for easy running through “source(…)” )
(Follow the link for other posts by Tal Galili)

Tuesday, April 20, 2010

GGPLOT

Here is a great site that has tutorials for ggplot in R!

http://had.co.nz/ggplot2/

Monday, April 19, 2010

R/Bioconductor SISG Workshop at UW Seattle

UW Seattle is has one of the top biology and statistics programs in the country.  Each year they offer a Summer Institute in Statistical Genetics.  There are scholarships offered for students!  I went before and took the R/Bioconductor workshop as well as the MCMC for Geneticists courses.  Here is the link with the lectures from the Bioconductor workshop taught by T. Lumley and K. Rice.

http://faculty.washington.edu/kenrice/sisg/

Sunday, April 18, 2010

R Analytic Flow

I just discovered this software found @ http://www.ef-prime.com/products/ranalyticflow_en/


 R AnalyticFlow is a software which enables
state-of-the-art data analysis by drawing analysis flowcharts.
You can effectively share processes of data analysis
in collaborative works.
The software is available without charge for any purpose.

This seems like a great program for anyone that has large data sets in R and works with several different collaborators.

GWAS Manhattan Plots and QQ plots using ggplot2 in R

From: Getting Genetics Done Blog

Will posted earlier this week about how to produce manhattan plots of GWAS results using Stata, so I thought I'd share how I do this in R using ggplot2.

First, if you've never used ggplot2, you'll need to add it to your R installation by typing:


install.packages("ggplot2")

Once you've done that, copy and paste this command to download the functions I wrote necessary to produce these plots.  If you'd like to see the source code yourself, copy the URL into your web browser.

source("http://dl.dropbox.com/u/66281/0_Permanent/qqman.r")

Next, read in the PLINK results file to a data frame. Substitute plink.qassoc with your own results filename.

mydata=read.table("plink.qassoc", header=TRUE)

Finally, run this function which will produce and save in the current directory both a QQ-plot and a manhattan plot of your results:

qqman(mydata)


A few notes:  First, if you're doing this on a linux machine from your Windows computer, you'll need to be running the previously mentioned XMing server on your Windows computer for the plot to save correctly.  Second, the qqman() function calls the manhattan() function, which is extremely slow and memory-intensive. It may take about 3 minutes to run for each dataset. The memory issue isn't a problem on 64-bit machines, but you may run out of memory on 32-bit machines if you're doing this with GWAS data. I'm going to try to improve this in the future. Finally, using that source command you also downloaded a function I wrote called qqmanall(), which does just what it sounds like - if you run it on a linux machine with no arguments it reads in ALL of the plink GWAS results stored in the current directory, and creates QQ and manhattan plots for all of them with a common upper limit for the y-axis corresponding to the most significant result. Enjoy.

...

Update Thursday, January 21, 2010: I neglected to mention yesterday the format of the plink.assoc or plink.qassoc files, in case you want to produce the same plots using results from another software other than plink. When you load your .assoc files in a data frame, the relevant columns are named "CHR", "BP", and "P". You can use this plotting function as long as you have these three columns in your data frame, regardless of whether you use PLINK or not.


Friday, April 16, 2010

Article Attacks R.... R-users fight back instantly!

Here is the link for the article: http://www.thejuliagroup.com/blog/?p=433

If you don't feel like reading it... here is just a quote that sums it up: “However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail.”


There are so many things wrong with this statement, that I immediately had to comment on her blog.  However, I was not the only one.  This article flew through the R-bloggers sphere http://www.r-bloggers.com within hours.  A R-blogger today posted an excellent blog about this event and the R-community.  It's worth a read!
http://www.r-statistics.com/2010/04/an-article-attacking-r-gets-responses-from-the-r-blogosphere-some-reflections/

Thursday, April 15, 2010

For the newbys.... how to read in data... simple

R-Graph Gallery

Here some really great examples of graphics in R.  They even publish their code they used to great these graphs.
http://addictedtor.free.fr/graphiques/

Random Forests

Are you interested in decision trees or classification trees?  Well then you should check out this package in R http://www.stat.berkeley.edu/~breiman/RandomForests/

There are so many applications of this package, and you can also download and compile in C!  This site has great documentation and there is also some information on the Quick-R site.

Tuesday, April 13, 2010

Quick R

Quick R is a great place to learn about R functions and graphics!  I use it all the time and recommend it for users of all levels!

Monday, April 12, 2010

BioConductor

http://www.bioconductor.org/

Bioconductor is an open source and open development software project
for the analysis and comprehension of genomic data.



The broad goals of the Bioconductor project are:
  • To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.
  • To facilitate the inclusion of biological metadata in the analysis of genomic data, e.g. literature data from PubMed, annotation data from LocusLink.
  • To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software.
  • To further scientific understanding by producing high-quality documentation and reproducible research.
  • To train researchers on computational and statistical methods for the analysis of genomic data.

Purdue R-users Group

This will become the blogosphere for the Purdue University R-users Group. This group will allow for networking between R-users from all disciplines across campus.

From the SF area R-users group: "R is an open source programming language for statistical computing, data analysis, and graphical visualization. R has an estimated one million users worldwide, and its user base is growing. While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in commercial areas such as quantitative finance and business intelligence.

Among R's strengths as a language are its powerful built-in tools for inferential statistics, its compact modeling syntax, its data visualization capabilities, and its ease of connectivity with persistent data stores (from databases to flatfiles).

In addition, R's open source nature and its extensibility via add-on "packages" has allowed it to keep up with the leading edge in academic research.

For all its strengths, though, R has an admittedly steep learning curve; the first steps towards learning and using R can be challenging.

To this end, the Bay Area R Users Group is dedicated to bringing together area practitioners of R to exchange knowledge, inspire new users, and spur the adoption of R for innovative research and commercial applications."