2
votes

I have a data frame with several numeric columns with "comma" class which is needed in order to save the data frame to an excel file and show the numeric columns in an excel comma format using Openxlsx package.

Now when i use dplyr package in order to group and summarize the data, the comma class is lost from the numeric columns.

Is it possible in some way to use dplyr package and still preserve the original comma classes ?

Here is the data frame with the comma classes:

library(tidyverse)
library(stringr)

set.seed(10)
df_central_database <- data.frame(Category = as.character(sample(words[1:10], size = 50, replace = TRUE)) ,
           Summ_Income =sample(1000:10000, size = 50, replace = TRUE),
           Summ_Securities =sample(1000:10000, size = 50, replace = TRUE),
           Summ_Bonds =sample(1000:10000, size = 50, replace = TRUE),
           Summ_Options =sample(1000:10000, size = 50, replace = TRUE)
           )


class(df_central_database$Summ_Income) <- "comma"
class(df_central_database$Summ_Securities) <- "comma"
class(df_central_database$Summ_Bonds) <- "comma"
class(df_central_database$Summ_Options) <- "comma"


str(df_central_database)

'data.frame':   50 obs. of  5 variables:
 $ Category       : Factor w/ 10 levels "a","able","about",..: 6 4 5 7 1 3 3 3 7 5 ...
 $ Summ_Income    :Class 'comma'  int [1:50] 4189 9428 3213 5258 2724 6249 5135 5207 4598 5548 ...
 $ Summ_Securities:Class 'comma'  int [1:50] 4099 1551 4321 4668 9229 8999 9854 5295 7242 4832 ...
 $ Summ_Bonds     :Class 'comma'  int [1:50] 8916 2774 1625 2416 4001 2620 2318 3615 9425 1922 ...
 $ Summ_Options   :Class 'comma'  int [1:50] 3008 5823 6963 8633 2342 7031 7855 9988 3369 8967 ...

Now using dplyr package to group and summarize resets the new data frame columns back to int :

df_rep1 <- df_central_database %>%
  group_by(Category) %>%
  summarise_all(.funs = sum)

str(df_rep1)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   10 obs. of  5 variables:
 $ Category       : Factor w/ 10 levels "a","able","about",..: 1 2 3 4 5 6 7 8 9 10
 $ Summ_Income    : int  23632 24434 48506 28288 26662 22076 19452 22832 25071 3469
 $ Summ_Securities: int  20390 20588 48728 31054 31550 33387 25930 28458 35604 8760
 $ Summ_Bonds     : int  21531 23576 33218 29206 26030 25966 34724 30306 36029 7113
 $ Summ_Options   : int  24345 31356 54054 28524 44705 28161 35068 25267 28022 5713

Is it possible to somehow prevent dplyr from resetting the class?

Thanks Rafael

1
So do the summarize and then convert the class. I provided a function here for doing that via dplyr. So If you do df_central_database %>% group_by(Category) %>% summarise_all(.funs = sum) %>% mutate_at(vars(contains('Summ')), funs(f1)), then the class will be commaSotos

1 Answers

3
votes

The problem here is that sum of comma-class returns integer class. You can fix this by writing a method for sum of comma-class objects.

Make a test vector:

> z = 1:10
> class(z)="comma"

The sum is not of that class:

> sum(z)
[1] 55

So write a method:

> sum.comma = function(...,na.rm=FALSE){val = NextMethod();class(val)="comma";val}

And now it is:

> sum(z)
[1] 55
attr(,"class")
[1] "comma"

So now with your dplyr example:

> df_rep1 <- df_central_database %>%
+   group_by(Category) %>%
+   summarise_all(.funs = sum)
> 
> str(df_rep1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   10 obs. of  5 variables:
 $ Category       : Factor w/ 10 levels "a","able","about",..: 1 2 3 4 5 6 7 8 9 10
 $ Summ_Income    :Class 'comma'  int [1:10] 23632 24434 48506 28288 26662 22076 19452 22832 25071 3469
 $ Summ_Securities:Class 'comma'  int [1:10] 20390 20588 48728 31054 31550 33387 25930 28458 35604 8760
 $ Summ_Bonds     :Class 'comma'  int [1:10] 21531 23576 33218 29206 26030 25966 34724 30306 36029 7113
 $ Summ_Options   :Class 'comma'  int [1:10] 24345 31356 54054 28524 44705 28161 35068 25267 28022 5713
> 

it keeps the class. Yes you will have to write methods for any functions you might want to apply to your class. S3 classes are implemented as attributes and R has a habit of dropping them at the earliest opportunity.

It might just be easier to write fixup:

result = fixup(result, source, "comma")

which returns result but with any columns of class "comma" with the same names in source set to class "comma".