165
votes

I have an R script that is shared with several users on different computers. One of its lines contains the install.packages("xtable") command.

The problem is that every time someone runs the script, R spends a great deal of time apparently reinstalling the package (it actually does take some time, since the real case has vector of several packages).

How can I make first check if the packages are installed and then only run install.packages() for the ones that are not?

16
Using require (e.g. @SachaEpskamp's solution) is the way to go. Approaches that use rownames(installed.packages()) are slow and are not always reliable (see Details of ?find.package and Note of ?installed.packages). require ensures that packages are not only installed but are able to be used (i.e. dependencies are met, etc.).jbaums
packageVersion("packageName")Kalanos

16 Answers

166
votes

try: require("xtable") or "xtable" %in% rownames(installed.packages())

66
votes

If you want to do it as simply as possible:

packages <- c("ggplot2", "dplyr", "Hmisc", "lme4", "arm", "lattice", "lavaan")

install.packages(setdiff(packages, rownames(installed.packages())))  

Replace the packages listed on the first line by those needed to run your code, and voilà!

Note: Edited to remove conditional wrapper thanks to Artem's comment below.

54
votes

This is a function I often used to check for a package, install it otherwise and load again:

pkgTest <- function(x)
  {
    if (!require(x,character.only = TRUE))
    {
      install.packages(x,dep=TRUE)
        if(!require(x,character.only = TRUE)) stop("Package not found")
    }
  }

Works like pkgTest("xtable"). It only works if the mirror is set though, but you could enter that in the require calls.

30
votes

I suggest a more lightweight solution using system.file.

is_inst <- function(pkg) {
    nzchar(system.file(package = pkg))
}

is_inst2 <- function(pkg) {
    pkg %in% rownames(installed.packages())
}

library(microbenchmark)
microbenchmark(is_inst("aaa"), is_inst2("aaa"))
## Unit: microseconds
##            expr      min        lq       mean    median       uq       max neval
##  is_inst("aaa")   22.284   24.6335   42.84806   34.6815   47.566   252.568   100
## is_inst2("aaa") 1099.334 1220.5510 1778.57019 1401.5095 1829.973 17653.148   100
microbenchmark(is_inst("ggplot2"), is_inst2("ggplot2"))
## Unit: microseconds
##                expr      min       lq     mean   median       uq      max neval
##  is_inst("ggplot2")  336.845  386.660  459.243  431.710  483.474  867.637   100
## is_inst2("ggplot2") 1144.613 1276.847 1507.355 1410.054 1656.557 2747.508   100
17
votes

There's also the CRAN package pacman which has the p_load function to install one or more packages (but only if necessary) and then load them.

15
votes
requiredPackages = c('plyr','ggplot2','ggtern')
for(p in requiredPackages){
  if(!require(p,character.only = TRUE)) install.packages(p)
  library(p,character.only = TRUE)
}
12
votes
# Function to check whether package is installed
  is.installed <- function(mypkg){
    is.element(mypkg, installed.packages()[,1])
  } 

  # check if package "hydroGOF" is installed
  if (!is.installed("hydroGOF")){
    install.packages("hydroGOF")
  }
8
votes

I found a packages script somewhere that I always put in every script to load my libraries. It will do all your library handling (downloading, installing and loading), and only when needed.

# Install function for packages    
packages<-function(x){
  x<-as.character(match.call()[[2]])
  if (!require(x,character.only=TRUE)){
    install.packages(pkgs=x,repos="http://cran.r-project.org")
    require(x,character.only=TRUE)
  }
}
packages(ggplot2)
packages(reshape2)
packages(plyr)
# etc etc
6
votes

I have implemented the function to install and load required R packages silently. Hope might help. Here is the code:

# Function to Install and Load R Packages
Install_And_Load <- function(Required_Packages)
{
    Remaining_Packages <- Required_Packages[!(Required_Packages %in% installed.packages()[,"Package"])];

    if(length(Remaining_Packages)) 
    {
        install.packages(Remaining_Packages);
    }
    for(package_name in Required_Packages)
    {
        library(package_name,character.only=TRUE,quietly=TRUE);
    }
}

# Specify the list of required packages to be installed and load    
Required_Packages=c("ggplot2", "Rcpp");

# Call the Function
Install_And_Load(Required_Packages);
4
votes

The solution I used derived from Sacha Epskamp and Shuguang's input. Here's the function:

instalaPacotes <- function(pacote) {
  if (!pacote %in% installed.packages()) install.packages(pacote)
}

It works silently, echoing nothing if package "pacote" is already installed and installing it otherwise. Don't forget to write the name of the package between quotes!

4
votes

Or a massively overticked example from drknexus/repsych on github, glibrary. There are almost certainly more efficient and better ways to to do this, but I programmed it a long while back and it basically works.

  • It works even if a repo hasn't been selected by taking the default cloud option if available. If you are on an older version of R it will roll back and pick a mirror based on country code.
  • It tries to load the library (this step could be made more efficient using some of the methods above)
    • If it fails, it will try to install it
    • If the install fails it will notify you which packages failed to install
  • That is right, packages, multiple packages can be loaded/installed in a single pass along with their dependencies (at least usually, there may be a bug here).

e.g.: glibrary(xtable,sos,data.table) but I don't think it will freak out if you call glibrary("xtable","sos","data.table") instead. Pushes/pulls/forks welcome.

Code for the function:

#' Try to load a library, if that fails, install it, then load it.
#'
#' glibrary short for (get)library.
#' The primary aim of this function is to make loading packages more transparent.  Given that we know we want to load a given package, actually fetching it is a formality.  glibrary skims past this formality to install the requested package.
#'
#' @export
#' @param ... comma seperated package names
#' @param lib.loc See \code{\link{require}}
#' @param quietly See \code{\link{require}}
#' @param warn.conflicts See \code{\link{require}}
#' @param pickmirror If TRUE, glibrary allows the user to select the mirror, otherwise it auto-selects on the basis of the country code
#' @param countrycode This option is ignored and the first mirror with the substring "Cloud", e.g. the RStudio cloud, is selected.  If no mirrors with that substring are identified, glibrary compares this value to results from getCRANmirrors() to select a mirror in the specified country.
#' @return logical; TRUE if glibrary was a success, an error if a package failed to load
#' @note keep.source was an arguement to require that was deprecated in R 2.15
#' @note This warning \code{Warning in install.packages: InternetOpenUrl failed: 'The operation timed out'} indicates that the randomly selected repository is not available.  Check your internet connection.  If your internet connection is fine, set pickmirror=TRUE and manually select an operational mirror.
#' @examples
#' #glibrary(lattice,MASS) #not run to prevent needless dependency
glibrary <- function(..., lib.loc = NULL, quietly = FALSE, warn.conflicts = TRUE, pickmirror = FALSE, countrycode = "us") {
  warningHandle <- function(w) {
    if (grepl("there is no package called",w$message,fixed=TRUE)) {
      return(FALSE) #not-loadable
    } else {
      return(TRUE) #loadable
    }
  }

  character.only <- TRUE  #this value is locked to TRUE so that the function passes the character value to require and not the variable name thislib
  librarynames <- unlist(lapply(as.list(substitute(.(...)))[-1],as.character))
  #if package already loaded, remove it from librarynames before processing further
  si.res <- sessionInfo()
  cur.loaded <- c(si.res$basePkgs,names(si.res$otherPkgs)) #removed names(si.res$loadedOnly) because those are loaded, but not attached, so glibrary does need to handle them.
  librarynames <- librarynames[librarynames %!in% cur.loaded]
  success <- vector("logical", length(librarynames))
  if (length(success)==0) {return(invisible(TRUE))} #everything already loaded, end.

  alreadyInstalled <- installed.packages()[,"Package"]
  needToInstall <- !librarynames %in% alreadyInstalled

  if (any(needToInstall)) {
    if (pickmirror) {chooseCRANmirror()}
    if (getOption("repos")[["CRAN"]] == "@CRAN@") {
      #Select the first "Cloud" if available
      m <- getCRANmirrors(all = FALSE, local.only = FALSE)
      URL <- m[grepl("Cloud",m$Name),"URL"][1] #get the first repos with "cloud" in the name
      if (is.na(URL)) { #if we did not find the cloud,
        #Fall back and use the previous method
        message("\nIn repsych:glibrary:  Now randomly selecting a CRAN mirror. You may reselect your CRAN mirror with chooseCRANmirror().\n")
        #if there is no repository set pick a random one by country code
        getCRANmirrors.res <- getCRANmirrors()
        foundone <- FALSE  #have we found a CRAN mirror yet?
        #is it a valid country code?
        if (!countrycode %in% getCRANmirrors.res$CountryCode) {
          stop("In repsych::glibrary:  Invalid countrycode argument")
        }
        ticker <- 0
        while (!foundone) {
          ticker <- ticker + 1
          URL <- getCRANmirrors.res$URL[sample(grep(countrycode, getCRANmirrors.res$CountryCode), 1)]
          host.list <- strsplit(URL, "/")
          host.clean <- unlist(lapply(host.list, FUN = function(x) {return(x[3])}))
          #make sure we can actually access the package list
          if (nrow(available.packages(contrib.url(URL)))!=0) {foundone <- TRUE}        
          if (ticker > 5) {stop("In repsych::glibrary:  Unable to access valid repository.  Is the internet connection working?")}
        } #end while
      } #end else
      repos <- getOption("repos")
      repos["CRAN"] <- gsub("/$", "", URL[1L])
      options(repos = repos)
    } #done setting CRAN mirror
    #installing packages
    installResults <- sapply(librarynames[needToInstall],install.packages)
    #checking for successful install
    needToInstall <- !librarynames %in% installed.packages()[,"Package"]
    if (any(needToInstall)) {
      stop(paste("In repsych::glibrary: Could not download and/or install: ",paste(librarynames[needToInstall],collapse=", "),"... glibrary stopped.",sep=""))
    } # done reporting any failure to install
  } #done if any needed to install

  #message("In repsych::glibrary:  Attempting to load requested packages...\n")
  #success <- tryCatch(
  success <- sapply(librarynames,require, lib.loc = lib.loc, quietly = FALSE, warn.conflicts = warn.conflicts, character.only = TRUE)
  #, warning=warningHandle) #end tryCatch
  if(length(success) != length(librarynames)) {stop("A package failed to return a success in glibrary.")}


  if (all(success)) {
    #message("In repsych::glibrary:  Success!")
    return(invisible(TRUE))
  } else {
    stop(paste("\nIn repsych::glibrary, unable to load: ", paste(librarynames[!success]), 
               collapse = " "))
  }
  stop("A problem occured in glibrary") #shouldn't get this far down, all returns should be made.
}
NULL
2
votes

How about trying this?

#will install the pROC library if you don't have it
if(!is.element('pROC', installed.packages()[,1]))
  {install.packages('pROC')
}else {print("pROC library already installed")}
0
votes

Why not just delete the line from the script? If the end-user doesn't have the smarts to install xtable as needed, you've got bigger problems :-( . That said, check out installed.packages()

Edit: dang, Ninja'd by ONE minute!

Edit: a general suggestion: load the package sos , and you'll find it very easy to get answers to a lot of "is there a function that does XXXXX" questions.

0
votes

This should do it. You can make required.packages a vector if you need to check for more than one.

required.packages <- "data.table"
new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
0
votes

Reading everyone's responses, I took some hints here and there and created mine. Actually very similar to most though.

## These codes are used for installing packages
# function for installing needed packages
installpkg <- function(x){
    if(x %in% rownames(installed.packages())==FALSE) {
        if(x %in% rownames(available.packages())==FALSE) {
            paste(x,"is not a valid package - please check again...")
        } else {
            install.packages(x)           
        }

    } else {
        paste(x,"package already installed...")
    }
}

# install necessary packages
required_packages  <- c("sqldf","car")
lapply(required_packages,installpkg)
0
votes

Looked at my old function, updated it using tips above and this is what i got.

# VERSION 1.0
assign("installP", function(pckgs){
    ins <- function(pckg, mc){
        add <- paste(c(" ", rep("-", mc+1-nchar(pckg)), " "), collapse = "");
        if( !require(pckg,character.only=TRUE) ){
            reps <- c("http://lib.stat.cmu.edu/R/CRAN","http://cran.uk.R-project.org");
            for (r in reps) try(utils::install.packages(pckg, repos=r), silent=TRUE);
            if(!require(pckg,character.only = TRUE)){   cat("Package: ",pckg,add,"not found.\n",sep="");
            }else{                                      cat("Package: ",pckg,add,"installed.\n",sep="");}
        }else{                                          cat("Package: ",pckg,add,"is loaded.\n",sep=""); } }
    invisible(suppressMessages(suppressWarnings(lapply(pckgs,ins, mc=max(nchar(pckgs)))))); cat("\n"); 
}, envir=as.environment("dg_base"))

installP(c("base","a","TFX"))
Package: base ------------------- is loaded.
Package: a ---------------------- not found.
Package: TFX -------------------- installed.