I have been researching this for a while and I can't seem to find the issue. I use dplyr regularly, but seems like all of a sudden, I am getting odd output from the group_by/summarise combination.
I have a large dataset and I am trying to summarize it using the following:
dataAgg <- dataRed %>% group_by(ClmNbr, SnapshotDay, Pre2016) %>%
filter(SnapshotDay == '30'| SnapshotDay == '90') %>%
summarise(
NumFeat = sum(FeatureNbr),
TotInc = sum(IncSnapshotDay),
TotDelta = sum(InctoFinal),
TotPaid = sum(FinalPaid)
)
The setup of the data frame is below:
'data.frame': 123819 obs. of 8 variables:
$ ClmNbr : Factor w/ 33617 levels "14-00765132",..: 2162 2163 2163 2164 1842 2287 27 27 27 28 ...
$ SnapshotDay : Factor w/ 3 levels "7","30","90": 1 1 1 1 1 1 1 1 1 1 ...
$ Pre2016 : Factor w/ 2 levels "Post2016","Pre2016": 2 2 2 2 2 2 2 2 2 2 ...
$ FeatureNbr : int 6 2 3 3 6 2 4 5 6 5 ...
$ IncSnapshotDay: num 5000 77 5000 4500 77 2200 1800 1100 1800 25000 ...
$ FinalPaid : num 442 0 15000 5000 0 ...
$ InctoFinal : num -4558 -77 10000 500 -77 ...
$ TimeDelta : num 25.833 2.833 2.833 0.833 1.833 ...
When I execute the code, I get 1 obs. of 4 variables; there is no grouping applied.
'data.frame': 1 obs. of 4 variables:
$ NumFeat : int 287071
$ TotInc : num NA
$ TotDelta: num NA
$ TotPaid : num 924636433
I used to do this all the time without problems.
I could use aggregate, but sometimes, I am mixing and matching functions based on the column so it does not always work.
What am I doing wrong?
plyr
"after"dplyr
right? – TungsessioInfo()
show? When asking for help, you should include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. Astr()
isn't as helpful as adput()
for testing. – MrFlick