In R, I have a dataset that has one independent variable and 9 dependent variables, and I want to see the scatter plot, histogram plus correlation values like in chart.Correlation()
but I don't want to see the correlations between the dependent variables...as it is unnecessary.
I.e in the following mock example, I only care about/want to see the top row and to left most column, with all the histograms, with all the red lines and significance stars etc, BUT I don't care about/don't want all other other scatter plots and correlation values. Is this possible/is there a neat way of seeing all this in one visualisation...i.e. independent variable vs all dependent variables...?
mock example:
d <- xts(matrix(rnorm(10000),ncol=10), Sys.Date()-1000:1)
library(PerformanceAnalytics)
chart.Correlation(d)
On a side note...I am getting a bit annoyed by the font size of some of the correlation values produced from chart.Correlation
... any way to set a minimum and maximum font size so that the font size doesn't become unreadable...
Also please feel free to use any other package (e.g. ggplot2 etc) that you think might be useful to help find a solution to the problem.
thanks in advance
EDIT:
So this is what I have come up with so far using ggplot
and plyr
...I'm still missing the histogram of the independent variable...oh and multiplot
comes from here: http://wiki.stdout.org/rcookbook/Graphs/Multiple%20graphs%20on%20one%20page%20(ggplot2)/
and have now included it as answer...but any other suggestions/improvements would be well received....
require(plyr)
require(ggplot2)
indep.dep.cor <- function(xts.obj, title=""){
# First column always assumed to be independent
df <- data.frame(coredata(xts.obj))
assign('df',df,envir=.GlobalEnv)
df.l <- melt(df, id.vars=colnames(df)[1], measure.vars=colnames(df)[2:ncol(df)])
assign('df.l',df.l, envir=.GlobalEnv)
cor.vals <- ddply(df.l, c("variable"), summarise, round(cor(df[,1],value),3))
stars <- ddply(df.l, c("variable"), summarise, symnum(cor.test(df[,1],value)$p.value, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " ")))
cor.vals$stars <- stars[,2]
assign('cor.vals',cor.vals,envir=.GlobalEnv)
bin.w <- min((ddply(df.l,c("variable"),summarise,diff(range(value))/30))[,2])
m1 <- ggplot(df.l,aes_string(x="value"))+
facet_grid(.~variable)+
stat_density(aes(y=..density..),fill=NA, colour="red", size=1.2)+
geom_histogram(aes(y=..density..),fill="white", colour="black", binwidth=bin.w)+
opts(title=title)
m2 <- ggplot(df.l,aes_string(x=colnames(df.l)[1], y="value"))+
facet_grid(.~variable)+geom_point(aes(alpha=0.2))+
opts(legend.position="none")+
geom_text(data=cor.vals,aes(label=paste(cor.vals[,2],cor.vals[,3]),size=abs(cor.vals[,2])*2,colour=cor.vals[,2]),x=Inf,y=Inf,vjust=1,hjust=1,show_guide=FALSE)+
scale_colour_gradient(low = "red", high="blue")+
geom_smooth(method="loess")
multiplot(m1,m2,cols=1)
}
indep.dep.cor(d)
multiplots
supposed to do? - IRTFM