1
votes

I am working with a dataset of countries, with different values for different points in time. There is one observation each month, so I have used the as.date fucntion such that dates are 01-07-2018, 01-08-2018 etc. For each country and each date exist a corresponding value. I want to transform this into an dataframe, where all the countries values are aggregated at a date. I have tried the reshape function such that

reshape(Origin_wide, idvar = "Origin", timevar = "V5", direction = "wide")

V5 being the date variable and Origin being the country. This is as far as I can tell dropping the as.date set, such that the dates and the corresponding values is no longer treated as such. Can it be done in a more smooth way ? The time series aspect of the data series disseapears ones the reshape function is lost, such that i cannot difference the time series or plot it by date etc. Furthermore the dates are getting a "value" prefix, which is the same problem as far as I can tell.

Here is the data structure as it is:

|       V5            |        Origin    |  Value   |
|---------------------|------------------|----------|
|      01-09-2017     |        USA       |     45   |
|---------------------|------------------|----------|
|      01-10-2017     |        USA       |     47   |
|---------------------|------------------|----------|
|      01-11-2017     |        USA       |     49   |
|---------------------|------------------|----------|
|      01-09-2017     |        Canada    |     7    |
|---------------------|------------------|----------|
|      01-10-2017     |        Canada    |     13   |
|---------------------|------------------|----------|
|      01-11-2017     |        Canada    |     17   |
|---------------------|------------------|----------|

And here is how I would like it to look:

    |       V5            |        Canada    |  USA     |
    |---------------------|------------------|----------|
    |      01-09-2017     |        7         |     45   |
    |---------------------|------------------|----------|
    |      01-10-2017     |        13        |     47   |
    |---------------------|------------------|----------|
    |      01-11-2017     |        17        |     49   |
    |---------------------|------------------|----------|

Hope this makes sense. To reproduce a random small version of the data :

    set.seed(1)
Data <- data.frame(Value = sample(1:10), Origin = sample(c("Mexico", "USA","Canada"), 10, replace = TRUE))
dates <- sample(seq(as.Date('2018/01/01'), as.Date('2018/05/01'), by="month"), 10, replace = TRUE)
Data <- cbind(dates,Data)

A look on the data as produced by the code enter image description here

As it is clear here, the values are not defined for all the dates. When this is the case, the value for that date is = 0. So in my first try with reshape it produces NA's for all the dates where there where no observations, which was perfect, because i was able to just put in 0's.

1
Could you maybe provide a minimal reproducible example, link including your data and your expected outcome.Hjalmar
Yes of course. It is here now !MNielsen
I appreciate you adding some data, but you are much more likely to get help here if your data is reproducible. The comment above included a link that explains how to do that - please read it. Here is specific r examplesConor Neilson
I am sorry for my lack of particularity regarding this question. I have added som code that should recreate a version of the data I am working with. Please let me know if you need more.MNielsen

1 Answers

0
votes

Recreate your data

library(tidyverse)

tbl <- tibble(
  V5 = rep(c("01-09-2017", "01-10-2017", "01-11-2017"), 2),
  Origin = rep(c("USA", "Canada"), each = 3),
  Value = c(45, 47, 49, 7, 13, 17)
)

Code

tbl %>%
  spread(Origin, Value)

Result

# A tibble: 3 x 3
  V5         Canada   USA
  <chr>       <dbl> <dbl>
1 01-09-2017      7    45
2 01-10-2017     13    47
3 01-11-2017     17    49

I am not sure it is a good idea to replace the NA by 0 because those truly are missing values, not values of 0. But if you really want to do this, then you can do:

result <- tbl %>%
  spread(Origin, Value)

result[is.na(result)] <- 0

Note: your toy example is random and does not represent the pattern of your actual data. Running the code on it outputs a pretty ugly wide format. When you create a toy example, try to make sure that it reflects the characteristics of your real data.