I am rather stuck. I have 2 data frames - df1 has unique station IDs, % values by month, & num of occurrences (years) of that station in the data; df2 has repeated station IDs by year and values per month by year.
df1: represents percentage on non-missing temperature data for each station by month; n represents number of years in that station's record
station_ID Jan Feb Mar ... Dec n
10160355 37 39 38 39 141
10160360 94 91 98 89 56
10160390 83 87 85 82 163
df2: temperature data by station for each month and year; n from df1 is the length of the repeated station_ID in df2
station_ID year Jan Feb Mar ... Dec
10160355 1878 NA 10 12 12
10160355 1879 12 12 13 10
...
10160355 2018 14 11 15 14
10160360 1963 12 10 12 14
10160360 1964 10 12 15 11
...
(repeats for all stations & total rows = 277604)
What I need: For each monthly column, if df1$station < 50%, replace the data in df2 with NAs for all rows for that station/month - else, leave df2 as is. So, since df1$station_ID[1] shows only 37% for Jan, all January's for that station (df2$station[1:141]) becomes NA.
Example output I need:
station_ID year Jan Feb ... Dec
10160355 1878 NA NA NA
10160355 1879 NA NA NA
...
10160360 1963 12 10 14
10160360 1964 10 12 11
...
I've tried about 20 different methods, but I think I need some form of dplyr with rep to repeat the NAs for rows for each station when the condition is true.
Latest attempt with just one month at a time since I couldn't figure out how to do all cols:
df3 = df2 %>%
group_by(station_ID) %>%
select(Jan) %>%
mutate(if_else(df1$Jan < 50, rep(NA_character_, df1$n), Jan))
This gives an error for invalid 'times' for rep. I think I might be close, but I appreciate any suggestions! Thanks!