1
votes

I Have data frame for a month (APRIL 1st - APRIL 30th). The data collected by hour. I want to create times series plot using ggplot_na_distribution (from the imputeTS package). The problem is, how to set my col names (header) as a clock (00.00 - 23.00)?

           0   1    2   3    4   5   6    7

01/04/2017 24,4 26,4 28,1 29,6 30,5 31 NA 30,7

02/04/2017 25,8 27,3 29,2 30,1 31 32,2 32 31,4

03/04/2017 26,2 27,5 29 30,2 31,1 31,7 31,6 30,2

04/04/2017 24,8 25,8 27,8 29,3 30,8 31,6 NA 29,4

05/04/2017 25,6 27,2 29,3 30,3 30,2 31,5 31,7 31,7

06/04/2017 25,7 25,9 26,6 28 28,4 27 28,7 30

Sorry if my question didn't clear. Yes, names(df) work. But my df can't be plot by ggplot_na_distribution. It says, my data should univariate data. I just want my data like tsAirgap data. In tsAirgap data, the rowname is year, then the colname is month. In this case, i want my rowname as day April 1st - April 30th, then my colname df is hour 00.00-23.00.

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1949 112 118 132 129  NA 135 148 148  NA 119 104 118

1950 115 126 141 135 125 149 170 170  NA 133  NA 140

1951 145 150 178 163 172 178 199 199 184 162 146 166

1952 171 180 193 181 183 218 230 242 209 191 172 194

1953 196 196 236 235 229 243 264 272 237 211 180 201

1954 204 188 235 227 234  NA 302 293 259 229 203 229

1955 242 233 267 269 270 315 364 347 312 274 237 278

1956 284 277  NA  NA  NA 374 413 405 355 306 271 306

1957 315 301 356 348 355  NA 465 467 404 347  NA 336

1958 340 318  NA 348 363 435 491 505 404 359 310 337

1959 360 342 406 396 420 472 548 559 463 407 362  NA

1960 417 391 419 461  NA 535 622 606 508 461 390 432

I appreciate any answer write down in my post. Thank you very much. Sorry for my english

2

2 Answers

2
votes

The ggplot_na_distribution function appears to require a single vector or a ts class object, which is what tsAirgap is.

There is info on how to create a ts object here: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ts.html

You also have the choice of reshaping your data.frame from its current 'wide' format, to a 'long' format and then plot the values..

library(tidyr)
YourDataNew <- gather(YourData,key = "hour",value = "data",Jan:Dec) %>% arrange(day)
YourDataNew$data <- as.numeric(YourDataNew$data)
library(imputeTS)
ggplot_na_distribution(YourDataNew$data) 

..will work without error, but I'm not sure how to set appropriate tick-labels in the resultant plot.

In any case, if you're working a lot with time-series data, it's probably best to learn how to create and use ts objects.

1
votes

Not so clear from your question. If I understood somewhat, you would like to set the names for columns in your data frame by hourly format.

You can use names to set the names for your data frame df like this (For example if you have 5 columns):

names(df) <- c("13.30", "14.30", "16.00", "17.00", "18.00")