I have a data set of real estate data. I'm trying to create a new column of days on market groups (labeled DOM_Groups) and group them into 15-day intervals (i.e. 0-14, 15-29, etc.). Then I'm trying to summarize()
these groupings by the count of observations and the average sale price for each 15-day group.
I'm using the cut()
function attempting to break my DOM_Groups into these 15-day intervals. In the base spreadsheet that I imported, the column containing the days on market has a unique observation in each cell, and the data in that column are numeric whole numbers...no decimals, no negative numbers.
When I run the following code, the tibble output is not grouping correctly, and it is including a negative number with a decimal, which does not exist in my data set. I'm not sure what to do to correct this.
gibbsMkt %>%
mutate(DOM_Groups = cut(DOM, breaks = 15, dig.lab = 2)) %>%
filter(Status == "SOLD") %>%
group_by(DOM_Groups) %>%
summarize(numDOM = n(),
avgSP = mean(`Sold Price`, na.rm = TRUE))
The tibble output I get is this:
DOM_Groups numDOM avgSP
<fct> <int> <dbl>
1 (-0.23,16] 74 561675.
2 (16,31] 18 632241.
3 (31,47] 11 561727.
4 (47,63] 8 545862.
5 (63,78] 7 729286.
6 (78,94] 6 624167.
7 (1.4e+02,1.6e+02] 2 541000
8 (1.6e+02,1.7e+02] 1 535395
Also, for rows 7 & 8 in the tibble, the largest number is 164, so I also don't understand why these rows are being converted to scientific notation.
When I use an Excel pivot table, I get the output that I want to reproduce in R, which is depicted below:
How can I reproduce this in R with the correct code?
levels(cut(rpois(1000, 1), breaks = 15))
andbreaks = 15
means you will get 15 intervals, not cut the data into 15-unit intervals – rawr