1
votes

I have a data frame that looks like this (these are the first 24 rows):

row     Name         Age
1       John         22
2       Alice        29
3       Michael      33
4       Briefing     NA
5       Class        A
6       Year         2016
7       Observation  0
8       End          NA
9       Steward      35
10      Louis        20
11      Josh         22
12      Marie        39
13      Briefing     NA
14      Observation  2
15      Year         2017
16      End          NA
17      Adam         27
18      Joseph       26
19      Andrew       26
20      Briefing     NA
21      Observation  2
22      Year         2017
23      Class        B
24      End          NA

I want it to look like this

  Name    Age  Class  Year  Observation
  John    22   A      2016  0
  Alice   29   A      2016  0
  Michael 33   A      2016  0
  Steward 35   NA     2017  2
  Louis   20   NA     2017  2
  Josh    22   NA     2017  2
  Marie   39   NA     2017  2
  Adam    27   B      2017  2
  Joseph  26   B      2017  2
  Andrew  26   B      2017  2

I understand that this process requires me to:

  • Detect every range where the data must be transferred: from the word "Briefing" to the word "End".
  • I also need to sort the information into their respective columns since the components change: the first group would transfer information to the columns "Class", "Year" and "Observation" while the second group only to the columns "Year" and "Observation" leaving the column "Class" with an NA value. The third group once more provides information for the columns "Class", "Year" and "Observation".

I would be grateful for your help to tackle this task with R.

Edit: Here is the dput

    structure(list(row = 1:24, Name = structure(c(7L, 2L, 12L, 4L, 
5L, 15L, 13L, 6L, 14L, 10L, 9L, 11L, 4L, 13L, 15L, 6L, 1L, 8L, 
3L, 4L, 13L, 15L, 5L, 6L), .Label = c("Adam", "Alice", "Andrew", 
"Briefing", "Class", "End", "John", "Joseph", "Josh", "Louis", 
"Marie", "Michael", "Observation", "Steward", "Year"), class = "factor"), 
    Age = structure(c(6L, 9L, 10L, NA, 13L, 4L, 1L, NA, 11L, 
    3L, 6L, 12L, NA, 2L, 5L, NA, 8L, 7L, 7L, NA, 2L, 5L, 14L, 
    NA), .Label = c("0", "2", "20", "2016", "2017", "22", "26", 
    "27", "29", "33", "35", "39", "A", "B"), class = "factor")), .Names = c("row", 
"Name", "Age"), class = "data.frame", row.names = c(NA, -24L))
1
Can you use dput to post your data so we can work with it?gfgm
Hi Gabriel. Absolutely. I have edited the post and placed the dput there. Thanks.dena76
Are you reading the data from csv or excel sheet, could you please confirmSilence Dogood
Hi Ossan, csv. Thanks.dena76
Hello, I have updated the example and the dput. I believe it is better explained now. Thanks for your help!dena76

1 Answers

1
votes

Essentially you have two data frames smushed together row-wise. We separate them and then join them back together column-wise.

library(dplyr)
library(tidyr)

# flag rows as being part of the "lookup" data frame
# and add a grouping so we know which lookup values belong with which data values
df = mutate(df, group = cumsum(df$Name == "End"),
       is_lookup = cumsum(Name == "Briefing") > cumsum(Name == "End") | Name == "End") %>%
    select(-row)

# break off the lookup data and make it wide
lookup = filter(df, is_lookup,
                ! Name %in% c("Briefing", "End")) %>%
    spread(key = Name, value = Age)

# break off the non-lookup data and join it to the wide lookup
df %>% filter(!is_lookup) %>%
    select(Name, group) %>% 
    left_join(lookup) %>%
    select(-group, -is_lookup)
# Joining, by = "group"
#       Name Class Observation Year
# 1     John     A           0 2016
# 2    Alice     A           0 2016
# 3  Michael     A           0 2016
# 4  Steward  <NA>           2 2017
# 5    Louis  <NA>           2 2017
# 6     Josh  <NA>           2 2017
# 7    Marie  <NA>           2 2017
# 8     Adam     B           2 2017
# 9   Joseph     B           2 2017
# 10  Andrew     B           2 2017