I have a data frame with the following dimensions:
18549282 obs. of 3 variables:
$ road: chr "MULTILINESTRING((30.5592664 -30.5971316,30.5597665 -30.5964615))" ...
$ n1 : int 0 0 0 0 0 0 0 0 0 0 ...
$ n2 : int 0 0 0 0 0 0 0 0 0 0 ...
There are no blank records in the road column, meaning that every record has a character.
When I use dplyr's group_by along with summarize to get the sum of n1 and sum of n2 by road I get a sum of n1 and n2 but I see a blank in the road column. e.g.
tt %>%
group_by(road) %>%
summarize(sn1 = sum(n1),
sn2 = sum(n2))
I get:
Again I'm 100% sure that there are no blanks in the road column.
But when I create a data frame with, lets say 1000 records as follows
small_dataset <- head(tt, 1000)
I don't see any blank records in the results:
Seems that dplyr strudels with the large amount of data.
Any ideas on how I can handle this issue?
"I'm 100% sure that there are no blanks"
, how did you test this? What is the output ofsum(tt$road == "")
? – zx8754