1
votes

I am attempting to create a correlation matrix of 16 different vectors and almost everything is as I want it, the only difference is that I would rather have best fit lines in the scatterplots rather than smoothed curves. I have seen some other posts that mention changing the pch using the pairs() portion of the function call to chart.Correlation, is there something similar to ask for a best-fit line rather than the smoothed curve?

I ask because I feel that the smoothed curves might be giving a false sense of high correlation within certain parts of the scatterplots, and I know the correlation is right there on the upper half but I would still like to have the option to change the line in the scatterplot from smoothed to best fit.

My code is pretty simple:

chart.Correlation(all.cell.types.rna.seq.table[,2:16], histogram=FALSE)

all.cell.types.rna.seq.table is a dataframe with 16 columns, the first of which is an id number.

Correlation matrix, smoothed lines rather than best fit lines :

Correlation matrix, smoothed lines rather than best fit lines

All I would like are best fit lines rather than smoothed curves in the scatterplots on the lower triangle of the correlation matrix image.

1

1 Answers

0
votes

I was looking for exactly the same ... And it's possible to use just the base function pairs() as shown here. Here's an example using the dataset mtcars:

reg <- function(x, y, ...) {
  points(x,y, ...)
  abline(lm(y~x), col = "red") 
}

panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r <- cor(x, y) # was abs(cor(x, y))
  txt <- format(c(r, 0.123456789), digits = digits)[1]
  txt <- paste0(prefix, txt)
  if(missing(cex.cor)) cex.cor <- 2 # or 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor) # was cex.cor * abs(r))
}

panel.hist <- function(x, ...) {
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}

pairs(mtcars[, c(1,3,4,5,6,7)], lower.panel = reg, upper.panel = panel.cor, diag.panel = panel.hist)

Result using <code>pairs()</code> on mtcars

But it's not as nice as chart.Correlation() and since I was not able to figure out, how to pass the reg function into the chart.Correlation(), I looked into the code of it and figured it out by simply changing it directly inside of the function: lower.panel = panel.smooth ==> lower.panel = reg. So here's the final example with mtcars:

chart.Correlation.linear <-
  function (R, histogram = TRUE, method=c("pearson", "kendall", "spearman"), ...)
  { # @author R Development Core Team
    # @author modified by Peter Carl & Marek Lahoda
    # Visualization of a Correlation Matrix. On top the (absolute) value of the correlation plus the result 
    # of the cor.test as stars. On botttom, the bivariate scatterplots, with a linear regression fit. 
    # On diagonal, the histograms with probability, density and normal density (gaussian) distribution.

    x = checkData(R, method="matrix")

    if(missing(method)) method=method[1] #only use one
    cormeth <- method

    # Published at http://addictedtor.free.fr/graphiques/sources/source_137.R
    panel.cor <- function(x, y, digits=2, prefix="", use="pairwise.complete.obs", method=cormeth, cex.cor, ...)
    {
      usr <- par("usr"); on.exit(par(usr))
      par(usr = c(0, 1, 0, 1))
      r <- cor(x, y, use=use, method=method) # MG: remove abs here
      txt <- format(c(r, 0.123456789), digits=digits)[1]
      txt <- paste(prefix, txt, sep="")
      if(missing(cex.cor)) cex <- 0.8/strwidth(txt)

      test <- cor.test(as.numeric(x),as.numeric(y), method=method)
      # borrowed from printCoefmat
      Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
                       cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
                       symbols = c("***", "**", "*", ".", " "))
      # MG: add abs here and also include a 30% buffer for small numbers
      text(0.5, 0.5, txt, cex = cex * (abs(r) + .3) / 1.3)
      text(.8, .8, Signif, cex=cex, col=2)
    }

    #remove method from dotargs
    dotargs <- list(...)
    dotargs$method <- NULL
    rm(method)

    hist.panel = function (x, ...=NULL ) {
      par(new = TRUE)
      hist(x,
           col = "light gray",
           probability = TRUE,
           axes = FALSE,
           main = "",
           breaks = "FD")
      lines(density(x, na.rm=TRUE),
            col = "red",
            lwd = 1)
      # adding line representing density of normal distribution with parameters correponding to estimates of mean and standard deviation from the data 
      ax.x = seq(min(x), max(x), 0.1)                                                  # ax.x containts points corresponding to data range on x axis
      density.est = dnorm(ax.x, mean = mean(x), sd = sd(x))   # density corresponding to points stored in vector ax.x 
      lines(ax.x, density.est, col = "blue", lwd = 1, lty = 1)                                # adding line representing density into histogram
      rug(x)
    }

    # Linear regression line fit over points
    reg <- function(x, y, ...) {
      points(x,y, ...)
      abline(lm(y~x), col = "red") 
    }

    # Draw the chart
    if(histogram)
      pairs(x, gap=0, lower.panel=reg, upper.panel=panel.cor, diag.panel=hist.panel)
    else
      pairs(x, gap=0, lower.panel=reg, upper.panel=panel.cor) 
  }

chart.Correlation.linear(mtcars[, c(1,3,4,5,6,7)], histogram = TRUE)

Result using modified chart.Correlation on mtcars