1
votes

I have two data frames that have common columns.

  # Generate DF1
    set.seed(219)
    x0 <- rnorm(5, 22, 17)
    x2 <- rnorm(5, 44, 15)
    x3 <- rnorm(5, 56, 13)
    x7 <- rnorm(5, 0, 3)
    x9 <- rnorm(5, 28, 31)
    x10 <- rnorm(5, 4, 75)
    x11 <- rnorm(5, 7, 1)
    dat1 <- data.frame(x0,x2,x3,x7,x9, x10, x11)
    dat1$ID1 <- rownames(dat1)

  # Generate DF2
    x1 <- rnorm(10, 2, 19)
    x2 <- rnorm(10, 4, 18)
    x3 <- rnorm(10, 5, 17)
    x4 <- rnorm(10, 7, 16)
    x5 <- rnorm(10, 8, 51)
    x6 <- rnorm(10, 9, 5)
    x7 <- rnorm(10, 0, 3)
    x8 <- rnorm(10, 34, 2)
    x9 <- rnorm(10, 28, 1)
    dat2 <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9)
    dat2$ID2 <- rownames(dat2)

Note that DF1 has 5 rows while DF2 has 10 rows. Also, similar columns names in each data frame does not mean that both columns are the same in value.

This is what I would like to do:

  1. Since DF1 has 5 rows, I need to create 5 columns in DF2 and let's call them y1, y2, y3, y4, y5.

  2. Here is how to compute y1: I need to take the first row in DF1 and multiply it with similar columns in DF2 for all the rows. The size of y1will be (10 rows and 1 column). And I need to calculate the following for each row in DF2.

    y1 = x0 + x2(DF1)*x2(DF2) + x3(DF1)*x3(DF2) + x7(DF1)*x7(DF2) + x9(DF1)*x9(DF2)

Similarly, For y2, we need to start in the second row of DF1 ... Etc.

In terms of vectors and matrices, here is how to compute y1.

Let the first row in DF1 as (x01, x21, x31, x71, x91, x101, x111, ID11). Then first first value of y1 (remember y1 is 10*1):

y11 = x01 + x21(DF1)*x21(DF2) + x31(DF1)*x31(DF2) + x71(DF1)*x71(DF2) + x91(DF1)*x91(DF2).

Second value of y1:

y12 = x01 + x21(DF1)*x22(DF2) + x31(DF1)*x32(DF2) + x71(DF1)*x72(DF2) + x91(DF1)*x92(DF2).

...

finaly, the 10th value of y1 is:

y110 = x01 + x21(DF1)*x210(DF2) + x31(DF1)*x310(DF2) + x71(DF1)*x710(DF2) + x91(DF1)*x910(DF2).

How can I implement my algorithm?

2
So, will the x0 be constant 'y2', 'y3', etc.. i.e. whether the formula differs for 'y2', 'y3', etc - akrun
correct. since DF2 has no x0.... I can add x0 in DF2 as 1's and then do x0(DF1)*x0(DF2). - user9292
also, one more doubt, so the 1st row for 'x2' in DF1 will be multiplied by all rows of 'x2' in 'DF2'? - akrun
x2 in DF1 is a number, and I need to multiply it for each x2 in DF2. - user9292
I have the same doubt/confusion. I think there's something not quite right in the logic here. If we multiplied 1 row of 1 common column from dat1 against all rows of that column in dat2, then the length would be 10x1. However you're talking about adding the unique columns of dat1 to the 10x1 products of all 4 common columns and I don't know how that could work out to a 10x1 vector. Perhaps you could give a step by step example of the calculations for y1. - Hack-R

2 Answers

2
votes

This is based on my understanding of what you want to do, which I alluded to in a comment.

Basically, for the non-common columns from dat1 I multiplied them by a vector of ones to get them into a conformable dimension, then added the 10x1 vectors for each y (y1, etc) row-wise, so that each is a 10x1 vector:

common_cols <- intersect(colnames(dat1),colnames(dat2))
uniq_cols   <- setdiff(colnames(dat1),colnames(dat2))
uniq_cols   <- uniq_cols[!uniq_cols=="ID1"]

tmp  <- data.frame(y1=rep(NA,10), y2=rep(NA,10),y3=rep(NA,10),y4=rep(NA,10),y5=rep(NA,10))
tmp1 <- data.frame(matrix(nrow=10, ncol = 7))

for(i in 1:nrow(dat1)){
  for(j in 1:length(common_cols)){
    tmp1[,j] <-  dat1[i,common_cols[j]] * dat2[,common_cols[j]] 
  }
  for(k in 1:length(uniq_cols)){
    tmp1[,k+4] <- dat1[i,uniq_cols[k]]*rep(1,10)
  }
  tmp[,i] <- rowSums(tmp1)
}

The result is:

tmp
         y1          y2        y3        y4         y5
1  2796.812   226.31244 1924.2130 4392.7841  1459.8979
2  1786.241    17.11732  716.6079 2044.0003   141.6572
3  1371.890  -334.09190  324.3946 1578.0200  -262.0858
4  1235.717  -446.01583  176.2845 1422.1088  -411.2424
5  1995.976  -377.33202 1152.6527 3297.5986   635.7040
6  2233.255   197.51252 1155.2367 2847.4433   599.1098
7  3437.539  1675.03212 2328.7100 3876.5423  1914.7753
8   291.687 -1331.27575 -737.9568  299.7451 -1413.6779
9  1659.648  -244.14992  678.3120 2266.3193   144.3870
10 1675.775  -532.41657  668.3817 2491.0892    60.3962
0
votes

With a dplyr and tidyr approach:

library(magrittr); library(dplyr)

Generate DF1

set.seed(219)
x0 <- rnorm(5, 22, 17)
x2 <- rnorm(5, 44, 15)
x3 <- rnorm(5, 56, 13)
x7 <- rnorm(5, 0, 3)
x9 <- rnorm(5, 28, 31)
x10 <- rnorm(5, 4, 75): i am commenting this out, based on your 
x11 <- rnorm(5, 7, 1): 
dat1 <- data.frame(x0,x2,x3,x7,x9, x10, x11)
# dat1$ID1 <- rownames(dat1) : not yet

Generate DF2

x1 <- rnorm(10, 2, 19)
x2 <- rnorm(10, 4, 18)
x3 <- rnorm(10, 5, 17)
x4 <- rnorm(10, 7, 16)
x5 <- rnorm(10, 8, 51)
x6 <- rnorm(10, 9, 5)
x7 <- rnorm(10, 0, 3)
x8 <- rnorm(10, 34, 2)
x9 <- rnorm(10, 28, 1)
dat2 <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,x9)
# dat2$ID2 <- rownames(dat2) : not yet

Create the missing vars in dat2

dat2$x0 <- 1
newCol <- names(dat1)[!(names(dat1) %in% names(dat2))]
dat2[, names(dat1)[!(names(dat1) %in% names(dat2))]] <- 0

Rownames columns

dat1$ID1 <- rownames(dat1)
dat2$ID2 <- rownames(dat2)

form wide to long table

df1 <- tidyr::gather(dat1, X, var, -c(ID1))
df2 <- tidyr::gather(dat2, X, var, -c(ID2))

join the two tables

df1 <- left_join(df1, df2, by="X")
rm(df2)

do your multiplication

df1$var <- df1$var.x * df1$var.y

create y columns

df1 %<>% group_by(ID1, ID2) %>% summarise(var=sum(var)) %>% ungroup %>% 
  mutate(ID1=paste0("y", ID1)) %>% 
  {left_join(dat2, tidyr::spread(., ID1, var), by="ID2")}

keep relevant columns

df1 <- df1[, names(df1)[!(names(df1) %in% newCol)]]

View(df1)