Im trying to produce a correlation plot for my data but i get 'x must be numeric error', other fixes have not worked for my case. Do i have to change the month to numeric as well? or is there a way of selecting only the numeric columns for my plot
Tried converting all to numeric but it just changes back to factor automatically
getwd()
myDF <- read.csv("qbase.csv")
head(myDF)
str(myDF)
cp <-cor(myDF)
head(round(cp,2))
'data.frame': 12 obs. of 8 variables:
$ Month : Factor w/ 12 levels "18-Apr","18-Aug",..: 5 4 8 1 9 7 6 2 12 11 ...
$ Monthly.Recurring.Revenue: Factor w/ 2 levels "$25,000 ","$40,000 ": 1 1 1 1 1 2 2 2 2 2 ...
$ Price.per.Seat : Factor w/ 2 levels "$40 ","$50 ": 2 2 2 2 2 1 1 1 1 1 ...
$ Paid.Seats : int 500 500 500 500 500 1000 1000 1000 1000 1000 ...
$ Active.Users : int 10 50 50 100 450 550 800 900 950 800 ...
$ Support.Cases : int 0 0 1 5 35 155 100 75 50 45 ...
$ Users.Trained : int 1 5 0 50 100 300 50 30 0 100 ...
$ Features.Used : int 5 5 5 5 8 9 9 10 15 15 ...
The results to dput(myDF)
as are follows:
dput( myDF)
structure(list(Month = structure(c(5L, 4L, 8L, 1L, 9L, 7L, 6L,
2L, 12L, 11L, 10L, 3L), .Label = c("18-Apr", "18-Aug", "18-Dec",
"18-Feb", "18-Jan", "18-Jul", "18-Jun", "18-Mar", "18-May", "18-Nov",
"18-Oct", "18-Sep"), class = "factor"), Monthly.Recurring.Revenue = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("$25,000 ",
"$40,000 "), class = "factor"), Price.per.Seat = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("$40 ",
"$50 "), class = "factor"), Paid.Seats = c(500L, 500L, 500L,
500L, 500L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L, 1000L),
Active.Users = c(10L, 50L, 50L, 100L, 450L, 550L, 800L, 900L,
950L, 800L, 700L, 600L), Support.Cases = c(0L, 0L, 1L, 5L,
35L, 155L, 100L, 75L, 50L, 45L, 10L, 5L), Users.Trained = c(1L,
5L, 0L, 50L, 100L, 300L, 50L, 30L, 0L, 100L, 50L, 0L), Features.Used = c(5L,
5L, 5L, 5L, 8L, 9L, 9L, 10L, 15L, 15L, 15L, 15L)), class = "data.frame", row.names = c(NA,
-12L))
dput( myDF)
. You should do this with the original version. Not the one after the code below has been applied. – IRTFMdput(myDF)
. In the end i intend to produce a correlation plot usingcorrplot(myDF, method="circle")
and also run a multiple linear regression to see which variables affect active users the most. Any help with that will also be very welcome. – NeverQuit101