1
votes

I have a dataframe with a vector of years and several columns which contain the gdp_per_head_values of different countries at a specific point in time. I want to mutate this dataframe to get a variable which contains only the values of the variable of the specific point in time defined by the vector of years.

My data.frame looks like this :

set.seed(123)
dataset <- tibble('country' = c('Austria','Austria','Austria','Germany','Germany','Sweden','Sweden','Sweden'),
                  'year_vector' = floor(sample(c(1940,1950,1960),8,replace=T)),
                  '1940' = runif(8,15000,18000),
                  '1950' = runif(8,15000,18000),
                  '1960' = runif(8,15000,18000),
)

How can I mutate this dataframe as explained above, for example by the variable gpd_head

EDIT : Output should look like

set.seed(123)
 dataset <- tibble('country' = c('Austria','Austria','Austria','Germany','Germany','Sweden','Sweden','Sweden'),
                   'year_vector' = floor(sample(c(1940,1950,1960),8,replace=T)),
                   '1940' = runif(8,15000,18000),
                   '1950' = runif(8,15000,18000),
                   '1960' = runif(8,15000,18000)) %>% 
     mutate(gdp_head =c(.$'1940'[1],.$'1940'[2],.$'1960'[3],
                        .$'1950'[4],.$'1940'[5],.$'1960'[6],
                        .$'1960'[7],.$'1950'[8]  ))


       
2
For your example, what should your final output look like? If you are able to show a desired result based on your example data, please use set.seed() to make reproducible. - Ben
Edited with an example output. I hope this makes things clear. - mugdi

2 Answers

1
votes

Here is one approach:

First, since you are going to compare the year_vector column with column names (which will be character), you can convert year_vector to character as well:

dataset$year_vector <- as.character(dataset$year_vector)

You currently have a tibble defined - but if you have it as a plain data.frame you can subset based on a [row, column] matrix and add the matched results as gdp_head:

dataset <- as.data.frame(dataset)
dataset$gdp_head <- as.numeric(dataset[cbind(1:nrow(dataset), match(dataset$year_vector, names(dataset)))])
0
votes

I came up with the following solution which works aswell :

dataset %>% 
do(.,mutate(.,gdp_head  = pmap(list(1:nrow(.), year_vector),
function(x,y) .[x,(y-1901+16)]) %>%
unlist() ))

In this solution I just added the position of the first year variable to the column index and subtract that number from the year_vector. In this case the year variables start in the year 1901 which column index corresponds to 16.