0
votes

I have a dataset of train carloads. It currently has a number (weekly carload) listed for each company (the row) for each week (the columns) over the course of a couple years (100+ columns). I want to gather this into just two columns: a date and loads.

It currently looks like this:

3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
20.5       21.8      22.0       22.3
24.1       24.1      23.6       23.4
24.9       24.7      24.8       26.5

I'm looking for:

Date        Load
3/29/2017   32.7
3/29/2017   20.5
3/29/2017   24.1
3/29/2017   24.9
4/5/2017    31.6

I've been doing various versions of the following:

rail3 <- rail2 %>% 
  gather(`3/29/2017`:`1/24/2018`, key = "date", value = "loads")

When I do this it makes a dataset called rail3, but it didn't make the new columns I wanted. It only made the dataset 44 times longer than it was. And it gave me the following message:

Warning message:
attributes are not identical across measure variables;
they will be dropped 

I'm assuming this is because the date columns are currently coded as factors. But I'm also not sure how to convert 100+ columns from factors to numeric. I've tried the following and various other methods:

rail2["3/29/2017":"1/24/2018"] <- lapply(rail2["3/29/2017":"1/24/2018"], as.numeric)

None of this has worked. Let me know if you have any advice. Thanks!

2
As @Dave2e suggests (but doesn't explicitly state), the easiest way to solve the factor issue is to read the data in such that the columns are the desired class. You don't specify, though, exactly where the data set is coming from; that may not be an option if the data set is provided as output from some other function, for example. If you can be clearer in what you're starting with, we can better answer you. - Aaron left Stack Overflow
Also, the : notation is only within the tidyverse; try just this: rail2[] <- lapply(rail2[], as.numeric). Within the tidyverse, try mutate_at or mutate_if. - Aaron left Stack Overflow

2 Answers

0
votes

If you want to avoid warnings when gathering and want date and numeric output in final df you can do:

library(tidyr)
library(hablar)

# Data from above but with factors
rail2<-read.table(header=TRUE, text="3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
                  20.5       21.8      22.0       22.3
                  24.1       24.1      23.6       23.4
                  24.9       24.7      24.8       26.5", check.names=FALSE) %>% 
  as_tibble() %>% 
  convert(fct(everything()))

# Code
rail2 %>% 
  convert(num(everything())) %>% 
  gather("date", "load") %>% 
  convert(dte(date, .args = list(format = "%m/%d/%Y")))

Gives:

# A tibble: 16 x 2
   date        load
   <date>     <dbl>
 1 2017-03-29  32.7
 2 2017-03-29  20.5
 3 2017-03-29  24.1
 4 2017-03-29  24.9
 5 2017-04-05  31.6
0
votes

Here is a possible solution:

rail2<-read.table(header=TRUE, text="3/29/2017  4/5/2017  4/12/2017  4/19/2017
32.7       31.6      32.3       32.5
20.5       21.8      22.0       22.3
24.1       24.1      23.6       23.4
24.9       24.7      24.8       26.5", check.names=FALSE)

library(tidyr)
# gather the data from columns and convert to long format.
rail3 <- rail2 %>% gather(key="date", value="load")

rail3
#        date load
#1  3/29/2017 32.7
#2  3/29/2017 20.5
#3  3/29/2017 24.1
#4  3/29/2017 24.9
#5   4/5/2017 31.6
#6   4/5/2017 21.8
#7 ...