5
votes

I have a dataframe with temperature measurements every 10 minutes. The measurements were taken at different locations (named as 'LCZ'), with for each location the values in a different column.

This is part of my dataframe: (it also contains missing values NA)

 Time `LCZ 3-2` `LCZ 3-10` `LCZ 6-1` `LCZ 6-9` `LCZ 9-4`


               <dttm>     <dbl>      <dbl>     <dbl>     <dbl>     <dbl>
 1 2017-08-26 17:00:00      27.5       27.5      27.5      27.0      27.0
 2 2017-08-26 17:10:00      27.5       27.0      27.5      27.0      27.0
 3 2017-08-26 17:20:00      27.5       27.0      27.0      27.0      27.0
 4 2017-08-26 17:30:00      27.0       26.5      27.0      26.5      26.5
 5 2017-08-26 17:40:00      26.5       26.5      26.5      26.5      26.5
 6 2017-08-26 17:50:00      26.5       26.0      26.5      26.0      26.5
 7 2017-08-26 18:00:00      26.5       26.0      26.5      26.5      26.5
 8 2017-08-26 18:10:00      27.0       26.0      26.5      26.5      26.0
 9 2017-08-26 18:20:00      26.5       26.5      26.5      26.5      26.0
10 2017-08-26 18:30:00      26.5       26.5      26.5      26.5      26.0

I want for each location or column to calculate the hourly min/max/median temperature and in addition for the hourly min/max also the timestamp from the original data at which the min/max respectively occured.

Is this possible with R?

I tried already various functions.

group_by allowed me to calculate min/max for each column but without timestamps. period.apply also allowed me to calculate min/max/median but only for one column. Also aggregate() didn't lead to any succes.

I'm learning in R, but didn't came close to a solution on this problem.

This website has helped me with various problems, but I'm really stuck on this one. Can someone help? Thanks in advance

4
How do you define an hour?www
An hour as the summary of the values for 17:10, 17:20,...,17:50, the same for 18:10, 18:20,...,18:50 and so onM95
@M95 It would help if you can include format for your desired output in your question. ThanksMKR
So for each hour, you want 15 values = 5 cols x 3 stats?IRTFM
Yes, I wasn't sure what output to expect, but this answers my needs. Thank you for your answers! And also for the additional explanation in your answer which learned me to understand again a bit more in R.M95

4 Answers

6
votes

We can use floor_date from the package to create a new column Time2 to show the hourly information. If this is not the way you want to define the hourly grouping, you can also try round_date or ceiling_date. After that, we can use gather from the package to convert the data frame from wide format to long format.

library(dplyr)
library(tidyr)
library(lubridate)

dat2 <- dat %>%
  mutate(Time = ymd_hms(Time),
         Time2 = floor_date(Time, unit = "hour")) %>%
  gather(LCZ, Value, starts_with("LCZ")) %>%
  group_by(Time2, LCZ)

After that, we can summarize the data by LCZ and Time2.

dat3 <- dat2 %>%
  summarise(Min = min(Value, na.rm = TRUE),
            Max = max(Value, na.rm = TRUE),
            Median = median(Value, na.rm = TRUE)) %>%
  ungroup()
dat3
# # A tibble: 10 x 5
#    Time2               LCZ        Min   Max Median
#    <dttm>              <chr>    <dbl> <dbl>  <dbl>
#  1 2017-08-26 17:00:00 LCZ.3.10  26.0  27.5   26.8
#  2 2017-08-26 17:00:00 LCZ.3.2   26.5  27.5   27.2
#  3 2017-08-26 17:00:00 LCZ.6.1   26.5  27.5   27.0
#  4 2017-08-26 17:00:00 LCZ.6.9   26.0  27.0   26.8
#  5 2017-08-26 17:00:00 LCZ.9.4   26.5  27.0   26.8
#  6 2017-08-26 18:00:00 LCZ.3.10  26.0  26.5   26.2
#  7 2017-08-26 18:00:00 LCZ.3.2   26.5  27.0   26.5
#  8 2017-08-26 18:00:00 LCZ.6.1   26.5  26.5   26.5
#  9 2017-08-26 18:00:00 LCZ.6.9   26.5  26.5   26.5
# 10 2017-08-26 18:00:00 LCZ.9.4   26.0  26.5   26.0

If you want, we can create binary value to indicate if the value is minimum, maximum, or median as follows. This format is useful when you further want to filter the data frame.

dat4 <- dat2 %>%
  mutate(Min = (Value == min(Value, na.rm = TRUE)) + 0L,
         Max = (Value == max(Value, na.rm = TRUE)) + 0L,
         Median = (Value == median(Value, na.rm = TRUE)) + 0L) %>%
  ungroup()
dat4
# # A tibble: 50 x 7
#    Time                Time2               LCZ     Value   Min   Max Median
#    <dttm>              <dttm>              <chr>   <dbl> <int> <int>  <int>
#  1 2017-08-26 17:00:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  2 2017-08-26 17:10:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  3 2017-08-26 17:20:00 2017-08-26 17:00:00 LCZ.3.2  27.5     0     1      0
#  4 2017-08-26 17:30:00 2017-08-26 17:00:00 LCZ.3.2  27.0     0     0      0
#  5 2017-08-26 17:40:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0
#  6 2017-08-26 17:50:00 2017-08-26 17:00:00 LCZ.3.2  26.5     1     0      0
#  7 2017-08-26 18:00:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
#  8 2017-08-26 18:10:00 2017-08-26 18:00:00 LCZ.3.2  27.0     0     1      0
#  9 2017-08-26 18:20:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
# 10 2017-08-26 18:30:00 2017-08-26 18:00:00 LCZ.3.2  26.5     1     0      1
# # ... with 40 more rows

DATA

dat <- read.table(text = "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
                  '2017-08-26 17:00:00'      27.5       27.5      27.5      27.0      27.0
                  '2017-08-26 17:10:00'      27.5       27.0      27.5      27.0      27.0
                  '2017-08-26 17:20:00'      27.5       27.0      27.0      27.0      27.0
                  '2017-08-26 17:30:00'      27.0       26.5      27.0      26.5      26.5
                  '2017-08-26 17:40:00'      26.5       26.5      26.5      26.5      26.5
                  '2017-08-26 17:50:00'      26.5       26.0      26.5      26.0      26.5
                  '2017-08-26 18:00:00'      26.5       26.0      26.5      26.5      26.5
                  '2017-08-26 18:10:00'      27.0       26.0      26.5      26.5      26.0
                  '2017-08-26 18:20:00'      26.5       26.5      26.5      26.5      26.0
                  '2017-08-26 18:30:00'      26.5       26.5      26.5      26.5      26.0",
                  header = TRUE, stringsAsFactors = FALSE)
3
votes

Here is a way of doing it with dplyr verbs:

library(lubridate)

df %>%
  gather(Location, Temp, -Time) %>%
  group_by(Date = date(Time), HoD = hour(Time), Location) %>%
  mutate_at(.vars = "Temp", .funs = list(Min = min, Max = max, Median = median)) %>%
  filter(Temp == Min | Temp == Max) %>%
  arrange(Location, Time) %>%
  distinct(Temp, .keep_all = T) %>%
  mutate(MinMax = ifelse(Temp == Min, "MinTime", "MaxTime")) %>%
  dplyr::select(-Temp) %>%
  spread("MinMax", "Time")

Output:

Notice the NA which mean that minimum and maximum temperatures were identical on that day, in that hour and at that location.

# A tibble: 10 x 8
# Groups:   Date, HoD, Location [10]
   Location Date         HoD   Min   Max Median MaxTime             MinTime            
   <chr>    <date>     <int> <dbl> <dbl>  <dbl> <chr>               <chr>              
 1 LCZ.3.10 2017-08-26    17  26.0  27.5   26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
 2 LCZ.3.10 2017-08-26    18  26.0  26.5   26.2 2017-08-26 18:20:00 2017-08-26 18:00:00
 3 LCZ.3.2  2017-08-26    17  26.5  27.5   27.2 2017-08-26 17:00:00 2017-08-26 17:40:00
 4 LCZ.3.2  2017-08-26    18  26.5  27.0   26.5 2017-08-26 18:10:00 2017-08-26 18:00:00
 5 LCZ.6.1  2017-08-26    17  26.5  27.5   27.0 2017-08-26 17:00:00 2017-08-26 17:40:00
 6 LCZ.6.1  2017-08-26    18  26.5  26.5   26.5 NA                  2017-08-26 18:00:00
 7 LCZ.6.9  2017-08-26    17  26.0  27.0   26.8 2017-08-26 17:00:00 2017-08-26 17:50:00
 8 LCZ.6.9  2017-08-26    18  26.5  26.5   26.5 NA                  2017-08-26 18:00:00
 9 LCZ.9.4  2017-08-26    17  26.5  27.0   26.8 2017-08-26 17:00:00 2017-08-26 17:30:00
10 LCZ.9.4  2017-08-26    18  26.0  26.5   26.0 2017-08-26 18:00:00 2017-08-26 18:10:00
2
votes

Here is a tidyverse solution.

Explanation: We create a new hour-floored time column Time.hour, by which we can group; we then calculate the necessary summary statistics.

res <- df %>%
    mutate(Time = as.POSIXct(Time, format = "%Y-%m-%d %H:%M:%S")) %>%  # Time as POSIXct
    gather(location, value, -Time) %>%
    mutate(Time.hour = format(Time, "%y-%m-%d %H")) %>%
    group_by(Time.hour, location) %>%
    summarise(min = min(value), max = max(value), median = median(value));
res;
## A tibble: 10 x 5
## Groups:   Time.hour [?]
#   Time.hour   location   min   max median
#   <chr>       <chr>    <dbl> <dbl>  <dbl>
# 1 17-08-26 17 LCZ.3.10  26.0  27.5   26.8
# 2 17-08-26 17 LCZ.3.2   26.5  27.5   27.2
# 3 17-08-26 17 LCZ.6.1   26.5  27.5   27.0
# 4 17-08-26 17 LCZ.6.9   26.0  27.0   26.8
# 5 17-08-26 17 LCZ.9.4   26.5  27.0   26.8
# 6 17-08-26 18 LCZ.3.10  26.0  26.5   26.2
# 7 17-08-26 18 LCZ.3.2   26.5  27.0   26.5
# 8 17-08-26 18 LCZ.6.1   26.5  26.5   26.5
# 9 17-08-26 18 LCZ.6.9   26.5  26.5   26.5
#10 17-08-26 18 LCZ.9.4   26.0  26.5   26.0

If need be, convert to wide:

res %>%
    ungroup() %>%
    gather(what, val, min:median) %>%
    unite(key, what, location) %>%
    spread(key, val)
## A tibble: 2 x 16
#  Time.hour   max_LCZ.3.10 max_LCZ.3.2 max_LCZ.6.1 max_LCZ.6.9 max_LCZ.9.4
#  <chr>              <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
#1 17-08-26 17         27.5        27.5        27.5        27.0        27.0
#2 17-08-26 18         26.5        27.0        26.5        26.5        26.5
## ... with 10 more variables: median_LCZ.3.10 <dbl>, median_LCZ.3.2 <dbl>,
##   median_LCZ.6.1 <dbl>, median_LCZ.6.9 <dbl>, median_LCZ.9.4 <dbl>,
##   min_LCZ.3.10 <dbl>, min_LCZ.3.2 <dbl>, min_LCZ.6.1 <dbl>,
##   min_LCZ.6.9 <dbl>, min_LCZ.9.4 <dbl>

Sample data

df <- read.table(text =
    "Time 'LCZ 3-2' 'LCZ 3-10' 'LCZ 6-1' 'LCZ 6-9' 'LCZ 9-4'
 1 '2017-08-26 17:00:00'      27.5       27.5      27.5      27.0      27.0
 2 '2017-08-26 17:10:00'      27.5       27.0      27.5      27.0      27.0
 3 '2017-08-26 17:20:00'      27.5       27.0      27.0      27.0      27.0
 4 '2017-08-26 17:30:00'      27.0       26.5      27.0      26.5      26.5
 5 '2017-08-26 17:40:00'      26.5       26.5      26.5      26.5      26.5
 6 '2017-08-26 17:50:00'      26.5       26.0      26.5      26.0      26.5
 7 '2017-08-26 18:00:00'      26.5       26.0      26.5      26.5      26.5
 8 '2017-08-26 18:10:00'      27.0       26.0      26.5      26.5      26.0
 9 '2017-08-26 18:20:00'      26.5       26.5      26.5      26.5      26.0
10 '2017-08-26 18:30:00'      26.5       26.5      26.5      26.5      26.0", header = T, row.names = 1)
2
votes

Not very sure in which format OP wants results to be presented. One solution could be found using mutate_at as:

library(lubridate)
library(dplyr)

result <- df %>% mutate(Time = ymd_hms(Time)) %>%
  group_by(Hourly = format(Time, "%Y%m%d%H")) %>%
  mutate_at(vars(starts_with("LCZ")), funs(min = min, max = max, med = median )) %>%
  select(Time, Hourly, sort(names(select(.,-Time-Hourly))))

Result

result[,1:9]
# # A tibble: 10 x 9
# # Groups: Hourly [2]
#   Time                Hourly     LCZ3_02 LCZ3_02_max LCZ3_02_med LCZ3_10 LCZ3_10_max LCZ3_10_med LCZ3_10_min
#   <dttm>              <chr>        <dbl>       <dbl>       <dbl>   <dbl>       <dbl>       <dbl>       <dbl>
# 1 2017-08-26 17:00:00 2017082617    27.5        27.5        27.2    27.5        27.5        26.8        26.0
# 2 2017-08-26 17:10:00 2017082617    27.5        27.5        27.2    27.0        27.5        26.8        26.0
# 3 2017-08-26 17:20:00 2017082617    27.5        27.5        27.2    27.0        27.5        26.8        26.0
# 4 2017-08-26 17:30:00 2017082617    27.0        27.5        27.2    26.5        27.5        26.8        26.0
# 5 2017-08-26 17:40:00 2017082617    26.5        27.5        27.2    26.5        27.5        26.8        26.0
# 6 2017-08-26 17:50:00 2017082617    26.5        27.5        27.2    26.0        27.5        26.8        26.0
# 7 2017-08-26 18:00:00 2017082618    26.5        27.0        26.5    26.0        26.5        26.2        26.0
# 8 2017-08-26 18:10:00 2017082618    27.0        27.0        26.5    26.0        26.5        26.2        26.0
# 9 2017-08-26 18:20:00 2017082618    26.5        27.0        26.5    26.5        26.5        26.2        26.0
# 10 2017-08-26 18:30:00 2017082618    26.5        27.0        26.5    26.5        26.5        26.2        26.0

Data

df <- read.table(text =
"Time    LCZ3_02    LCZ3_10   LCZ6_01   LCZ6_09    LCZ9_04
1 '2017-08-26 17:00:00'      27.5       27.5      27.5      27.0      27.0
2 '2017-08-26 17:10:00'      27.5       27.0      27.5      27.0      27.0
3 '2017-08-26 17:20:00'      27.5       27.0      27.0      27.0      27.0
4 '2017-08-26 17:30:00'      27.0       26.5      27.0      26.5      26.5
5 '2017-08-26 17:40:00'      26.5       26.5      26.5      26.5      26.5
6 '2017-08-26 17:50:00'      26.5       26.0      26.5      26.0      26.5
7 '2017-08-26 18:00:00'      26.5       26.0      26.5      26.5      26.5
8 '2017-08-26 18:10:00'      27.0       26.0      26.5      26.5      26.0
9 '2017-08-26 18:20:00'      26.5       26.5      26.5      26.5      26.0
10 '2017-08-26 18:30:00'      26.5       26.5      26.5      26.5      26.0",
header = TRUE, stringsAsFactors = FALSE)