0
votes

I have two large data frames df1 & df2

df1

ID ...other columns...MON TUE WED THU FRI SAT SUN
1                     8.5 8.5 8.5 8.5 8.5 6.0 0.0
2                     9.0 9.0 9.0 9.0 9.0 6.0 6.0
4                     6.0 7.0 7.0 7.0 7.0 5.0 0.0

df2

Day Hours
FRI   0
SAT   0
SUN   0
MON   0
TUE   0
WED   0
THU   0
FRI   0
SAT   0
SUN   0
MON   0
TUE   0
WED   0
...Will keep going until end about 28-31 days listed

I am trying to replace the Hours column in df2 with the corresponding times in df1 (Matching df1 column day names with the names in the day column in df2). df2 would fill using the first row of data in df1 until all rows in df2 have been filled then move onto the next row (loop through)...the idea being once 1st row has been filled, before moving onto the next, I would do a Sum of the 'Hours' column in df2 and then put that number back in df1 in a seperate column next to first row and so on...

df2 1st output

Day   Hours
FRI   8.5
SAT   6.0
SUN   0
MON   8.5
TUE   8.5
WED   8.5
THU   8.5
FRI   8.5
SAT   6.0
SUN   0
MON   8.5
TUE   8.5
WED   8.5
...will continue all the way until last row until all data is filled from 1st row in df1 (repeating itself, just matching the right values with rows)

df1 1st output

ID ...other columns...MON TUE WED THU FRI SAT SUN Total
1                     8.5 8.5 8.5 8.5 8.5 6.0 0.0 88.5
2                     9.0 9.0 9.0 9.0 9.0 6.0 6.0
4                     6.0 7.0 7.0 7.0 7.0 5.0 0.0

...notice the Total is only from the Hours column in df2 which is only populated from data in the the corresponding row

df2 2nd output

Day   Hours
FRI   9.0
SAT   6.0
SUN   6.0
MON   9.0
TUE   9.0
WED   9.0
THU   9.0
FRI   9.0
SAT   6.0
SUN   6.0
MON   9.0
TUE   9.0
WED   9.0

...once first row in df1 has filled all df2 and a sum has been made then it would clean the Hours column and refill but now with row 2 of df1...doing the same thing until all rows in df2 has been filled

df1 2nd output

ID ...other columns...MON TUE WED THU FRI SAT SUN Total
1                     8.5 8.5 8.5 8.5 8.5 6.0 0.0 88.5
2                     9.0 9.0 9.0 9.0 9.0 6.0 6.0 105
4                     6.0 7.0 7.0 7.0 7.0 5.0 0.0

...notice now the Total is only from the Hours column in df2 which is only populated from data in the the corresponding row

This continues for every line in df1 - which is about 1400 lines

So I am trying to sort out the matching/looping part of this but can't seem to get to a solution that works. Is there a way of doing this? I have been suggested using Matches, Joins, Left, Right Outer etc...but I am unsure how this would be used without a unique number ID and how would loop through the data frame

Many thanks

2
@ThomasIsCoding So df2 cotinues to repeat itself a set number of times (max 31 times)...the idea is that I need it to loop that amount times until end of row in df2 with data from the first row in df1 being plotted...once the first row from df1 has filled all of df2 then it moves onto the 2nd row in df1 and does the exact same thing...I hope that makes senseDaisy
I added a solution but not sure if fits your goal, please check it outThomasIsCoding

2 Answers

2
votes

Here is a data.table option, hope this can help

setDT(df1)
setDT(df2)
df1[
  melt(
    df1,
    id = "ID"
  )[order(ID)][
    ,
    .SD[df2, .(SumHours = sum(value)),
      on = .(variable = Day)
    ], ID
  ],
  on = .(ID)
]
  • Output
   ID MON TUE WED THU FRI SAT SUN SumHours
1:  1 8.5 8.5 8.5 8.5 8.5   6   0       57
2:  2 9.0 9.0 9.0 9.0 9.0   6   6       66

If you need to show check how two data tables are merged into one, you can try

melt(
  df1,
  id = "ID",
  value.name = "Hours"
)[order(ID)][
  ,
  .SD[df2,
    on = .(variable = Day)
  ], ID
][, i.Hours := NULL][]

which gives

    ID variable Hours
 1:  1      FRI   8.5
 2:  1      SAT   6.0
 3:  1      SUN   0.0
 4:  1      MON   8.5
 5:  1      TUE   8.5
 6:  1      WED   8.5
 7:  1      THU   8.5
 8:  1      FRI   8.5
 9:  2      FRI   9.0
10:  2      SAT   6.0
11:  2      SUN   6.0
12:  2      MON   9.0
13:  2      TUE   9.0
14:  2      WED   9.0
15:  2      THU   9.0
16:  2      FRI   9.0

UPDATE I am not sure if the code below is exactly what you are after

df1out <- df1[, Total := rowSums(.SD)]

df2out <- lapply(split(melt(
  df1,
  id = "ID",
  value.name = "Hours"
)[order(ID)][
  ,
  .SD[df2,
    on = .(variable = Day)
  ], ID
][, i.Hours := NULL], by = "ID"), function(x) x[, ID := NULL])

such that

> df1out
   ID MON TUE WED THU FRI SAT SUN Total
1:  1 8.5 8.5 8.5 8.5 8.5   6   0  49.5
2:  2 9.0 9.0 9.0 9.0 9.0   6   6  59.0
3:  4 6.0 7.0 7.0 7.0 7.0   5   0  43.0

> df2out
$`1`
    variable Hours
 1:      FRI   8.5
 2:      SAT   6.0
 3:      SUN   0.0
 4:      MON   8.5
 5:      TUE   8.5
 6:      WED   8.5
 7:      THU   8.5
 8:      FRI   8.5
 9:      SAT   6.0
10:      SUN   0.0
11:      MON   8.5
12:      TUE   8.5
13:      WED   8.5

$`2`
    variable Hours
 1:      FRI     9
 2:      SAT     6
 3:      SUN     6
 4:      MON     9
 5:      TUE     9
 6:      WED     9
 7:      THU     9
 8:      FRI     9
 9:      SAT     6
10:      SUN     6
11:      MON     9
12:      TUE     9
13:      WED     9

$`4`
    variable Hours
 1:      FRI     7
 2:      SAT     5
 3:      SUN     0
 4:      MON     6
 5:      TUE     7
 6:      WED     7
 7:      THU     7
 8:      FRI     7
 9:      SAT     5
10:      SUN     0
11:      MON     6
12:      TUE     7
13:      WED     7

Data

> dput(df1)
structure(list(ID = c(1L, 2L, 4L), MON = c(8.5, 9, 6), TUE = c(8.5,
9, 7), WED = c(8.5, 9, 7), THU = c(8.5, 9, 7), FRI = c(8.5, 9,
7), SAT = c(6, 6, 5), SUN = c(0, 6, 0)), class = "data.frame", row.names = c(NA,
-3L))

> dput(df2)
structure(list(Day = c("FRI", "SAT", "SUN", "MON", "TUE", "WED",
"THU", "FRI", "SAT", "SUN", "MON", "TUE", "WED"), Hours = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,       
-13L))
0
votes

Joining can help:

library(dplyr)
library(tidyr)
#Code
new <- df1 %>% pivot_longer(-1) %>%
  filter(ID==1) %>%
  left_join(df2,by = c('name'='Day')) %>%
  mutate(Hours=value) %>%
  select(-value)

Output:

# A tibble: 8 x 3
     ID name  Hours
  <int> <chr> <dbl>
1     1 MON     8.5
2     1 TUE     8.5
3     1 WED     8.5
4     1 THU     8.5
5     1 FRI     8.5
6     1 FRI     8.5
7     1 SAT     6  
8     1 SUN     0  

Some data used:

#Data
df1 <- structure(list(ID = 1:2, MON = c(8.5, 9), TUE = c(8.5, 9), WED = c(8.5, 
9), THU = c(8.5, 9), FRI = c(8.5, 9), SAT = c(6, 6), SUN = c(0, 
6)), class = "data.frame", row.names = c(NA, -2L))

df2 <- structure(list(Day = c("FRI", "SAT", "SUN", "MON", "TUE", "WED", 
"THU", "FRI"), Hours = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
-8L))