0
votes

I would like to multiply several columns from a certain data.frame * df1* with the Percentages matching the headers of the columns in * df1*. These are given in data.frame df2, column 1, called ID. I would like the output to be as in df3.

Note that my dataset is large. There are 13.000 rows and 33 columns in df1. There are 136 rows and 3 columns in df2.

What is the best way to handle this?

Examples of df1, df2, df3 are given below.

df1:

Date                  V1               V2            V3            V4
1/1/2000               0               0.4           0             0
2/1/2000               0               0.1           0             0.1
3/1/2000               0.5             0             0             1
4/2000                 0.8             1.5           1             1

df2:

    ID                Subbasin       Percentage
V1                 001               0.4
V4                 001               0.6
V1                 002               0.2
V2                 002               0.8
V1                 003               0.1
V2                 003               0.3
V3                 003               0.2
V4                 003               0.4

df3:

Date                   001             002             003
1/1/2000               0               0.32            0.12
2/1/2000               0.06            0.08            0.07
3/1/2000               0.8             0.1             0.45
4/2000                 0.92            1.36            1.13

I guess I have to start with omitting the Date, with

df1 <- NULL
3
Hi, if any answer solves your problem can you click on "accept it" so that other people can see it? thanksagenis

3 Answers

1
votes

Consider using the reshape2 package where you transform two times: 1) melt (wide to long); 2) merge (df1 and df2) with product field; 3) dcast (long to wide):

library(reshape2)

df1 <- read.table(text="Date V1 V2  V3 V4
1/1/2000 0 0.4 0 0
2/1/2000 0 0.1 0 0.1
3/1/2000 0.5 0 0 1
4/2000 0.8 1.5 1 1", 
 header=TRUE, stringsAsFactors = FALSE)

df2 <- read.table(text="ID Subbasin Percentage
V1 001 0.4
V4 001 0.6
V1 002 0.2
V2 002 0.8
V1 003 0.1
V2 003 0.3
V3 003 0.2
V4 003 0.4", 
  header=TRUE, colClasses=c("character", "character", "numeric"))

df1 <- melt(df1, id.vars=c("Date"), variable.name="ID")

df3 <- merge(df1, df2, by=c("ID"))
df3$product <- df3$value * df3$Percentage

df3 <- dcast(df3, Date~Subbasin, fun.aggregate=sum, value.var="product")
df3
#         Date    001    002    003
# 1   1/1/2000   0.00   0.32   0.12
# 2   2/1/2000   0.06   0.08   0.07
# 3   3/1/2000   0.80   0.10   0.45
# 4     4/2000   0.92   1.36   1.13
0
votes

You can use Sparse Matrix to do this matrix multiplication, after a little reshaping of the second data.frame:

library(dplyr); library(Matrix); library(reshape2)
m1 <- df1 %>% select(-Date) %>% as.matrix
m2 <- dcast(df2, ID~subbasin, fill=0) %>% select(-ID) %>% as.matrix %>%  Matrix(sparse=T)
m1 %*% m2
#### 4 x 3 Matrix of class "dgeMatrix"
####         1    2    3
#### [1,] 0.00 0.32 0.12
#### [2,] 0.06 0.08 0.07
#### [3,] 0.80 0.10 0.45
#### [4,] 0.92 1.36 1.13

This works if df2 doesn't have any zeros. If it does you have to add some trick to get the sparsity right.

I used this re-created data:

df1 = data.frame(Date=c("1/1/2000", "1/2/2000", "1/3/2000", "1/4/2000"), 
                 V1=c(0, 0, .5, .8),
                 V2=c(.4,.1,0, 1.5),
                 V3=c(0,0,0,1),
                 V4=c(0, .1, 1, 1))
df2=data.frame(ID=c("V1", "V4", "V1", "V2", "V1", "V2", "V3", "V4"), 
               subbasin=as.character(c(1,1,2,2,3,3,3,3)),
               percentage=c(4, 6, 2, 8, 1, 3, 2, 4)/10)
0
votes

Here is another option using base R

df3 <- df1[-4]
df3[ -1] <- as.matrix(df1[-1]) %*% xtabs(Percentage~ ID + Subbasin, df2)
df3
#      Date   V1   V2   V4
#1 1/1/2000 0.00 0.32 0.12
#2 2/1/2000 0.06 0.08 0.07
#3 3/1/2000 0.80 0.10 0.45
#4   4/2000 0.92 1.36 1.13