1
votes

I'm trying to build covariance matrix from a scratch (cov() function). My task is not to use any package. Hence I created my functions:

meanf <- function(x){
sum(x) / length(x)
}

sampleCov <- function(x,y){
  stopifnot(identical(length(x), length(y)))
  sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}

> sampleCov(winequality_red$quality, winequality_red$alcohol)
[1] 0.409789

Unfortunately, I'm stuck here. All loops I tried to apply are missing any point. Of course it's possible to just copy the sampleCov function and make it for every possible combination but that's not my point.

3
It's not clear to me what your question/problem actually is.Dason
Firstly I'd like to apply a function on one certain column and calculate it with the remaining ones: (x, y1), (x, y2), (x, y3), (x, yn+1).Tom
function, length and sum are all in the package "Base" so what are you actually allowed to use?rg255

3 Answers

1
votes

If I understand you correctly then I believe you want to recreate a covariate output like the one returned by cov function.

OPs given function:

meanf <- function(x){
    sum(x) / length(x)
}

sampleCov <- function(x,y){
    stopifnot(identical(length(x), length(y)))
    sum((x - meanf(x)) * (y - meanf(y))) / (length(x) - 1)
}

You can try this way, I have taken mtcars data here:

Covariate Function:

vars <- names(mtcars)
egrid <- expand.grid(vars, vars)
egrid <- data.frame(sapply(egrid, as.character),stringsAsFactors = F)
egrid <- egrid[order(egrid$Var1, egrid$Var2),]
mat <- vector("list", nrow(egrid))

for(i in 1:nrow(egrid)){
    mat[[i]] <- sampleCov(mtcars[,egrid[i,"Var1"]], mtcars[,egrid[i,"Var2"]])
}

finaldat <- cbind(egrid, cov = do.call('rbind', mat))
finaldat_list <- split(finaldat,  finaldat$Var1)
mat_form <- do.call('cbind', finaldat_list)

cov_values <- mat_form[,grepl("\\.cov",names(mat_form))]
col_values <- mat_form[,paste0(egrid$Var1[1],".Var2")]

final_matrix_cov <- cbind(col_values, cov_values)

Sample Output:

> final_matrix_cov
    col_values       am.cov    carb.cov     cyl.cov    disp.cov
9          mpg   1.80393145 -5.36310484  -9.1723790  -633.09721
20         cyl  -0.46572581  1.52016129   3.1895161   199.66028
31        disp -36.56401210 79.06875000 199.6602823 15360.79983
42          hp  -8.32056452 83.03629032 101.9314516  6721.15867
1
votes

You need the matrix multiplication %*%.

sampleCov <- function(x,y){
  stopifnot(identical(length(x), length(y)))
  sum((x - mean(x)) %*% (y - mean(y))) / (length(x) - 1)
}

> sampleCov(rnorm(10000),rnorm(10000))
[1] 0.01808466
1
votes

This is probably a little more than you need, but it should answer your question, and I think it is a nice illustration of the practical application of covariances, correlations, etc.

# load the data
link <- "https://raw.githubusercontent.com/DavZim/Efficient_Frontier/master/data/mult_assets.csv"
df <- data.table(read.csv(link))

# calculate the necessary values:
# I) expected returns for the two assets
er_x <- mean(df$x)
er_y <- mean(df$y)

# II) risk (standard deviation) as a risk measure
sd_x <- sd(df$x)
sd_y <- sd(df$y)

# III) covariance
cov_xy <- cov(df$x, df$y)

# create 1000 portfolio weights (omegas)
x_weights <- seq(from = 0, to = 1, length.out = 1000)

# create a data.table that contains the weights for the two assets
two_assets <- data.table(wx = x_weights,
                         wy = 1 - x_weights)

# calculate the expected returns and standard deviations for the 1000 possible portfolios
two_assets[, ':=' (er_p = wx * er_x + wy * er_y,
                   sd_p = sqrt(wx^2 * sd_x^2 +
                               wy^2 * sd_y^2 +
                               2 * wx * (1 - wx) * cov_xy))]
two_assets

# lastly plot the values
ggplot() +
  geom_point(data = two_assets, aes(x = sd_p, y = er_p, color = wx)) +
  geom_point(data = data.table(sd = c(sd_x, sd_y), mean = c(er_x, er_y)),
  aes(x = sd, y = mean), color = "red", size = 3, shape = 18) +
  # Miscellaneous Formatting
  theme_bw() + ggtitle("Possible Portfolios with Two Risky Assets") +
  xlab("Volatility") + ylab("Expected Returns") +
  scale_y_continuous(label = percent, limits = c(0, max(two_assets$er_p) * 1.2)) +
  scale_x_continuous(label = percent, limits = c(0, max(two_assets$sd_p) * 1.2)) +
  scale_color_continuous(name = expression(omega[x]), labels = percent)

See the link below for all details.

https://datashenanigan.wordpress.com/2016/05/24/a-gentle-introduction-to-finance-using-r-efficient-frontier-and-capm-part-1/