1
votes

My data concerns a company and includes Total Sales and the amount of sales in three counties CA , TX and WI.

Data :

> dput(head(WalData))
structure(list(CA = c(11047, 9925, 11322, 12251, 16610, 14696
), TX = c(7381, 5912, 9006, 6226, 9440, 9376), WI = c(6984, 3309, 
8883, 9533, 11882, 8664), Total = c(25412, 19146, 29211, 28010, 
37932, 32736), date = structure(c(1296518400, 1296604800, 1296691200, 
1296777600, 1296864000, 1296950400), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), event_type = c("NA", "NA", "NA", "NA", "NA", "Sporting"
), snap_CA = c(1, 1, 1, 1, 1, 1), snap_TX = c(1, 0, 1, 0, 1, 
1), snap_WI = c(0, 1, 1, 0, 1, 1)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

I am struggling to calculate the average sales share of the three states on the company's total sales.

Furthermore, i must calculate the same average percentages for each year, month of the year and day of the week.

Any advice would be really helpful !

3

3 Answers

1
votes

It would be easier to perform all the calculation if you get your data in long format.

library(dplyr)
library(tidyr)

WalData %>% pivot_longer(cols = CA:WI) %>% mutate(perc = value/Total)
1
votes

Using dplyryou can also try next options. For the average sales you can use next code:

library(dplyr)
#Code 1
AvgSales <- WalData %>% select(c(CA,TX,WI)) %>%
  summarise_all(mean,na.rm=T)

Output:

# A tibble: 1 x 3
      CA    TX    WI
   <dbl> <dbl> <dbl>
1 12642. 7890. 8209.

For the percentages you need to compute the ratio against Total:

#Code 2
AvgSalesPerc <- WalData %>% select(c(CA,TX,WI,Total)) %>%
  rowwise() %>% mutate(across(CA:WI,~./Total)) %>% 
  select(-Total) %>% ungroup() %>%
  summarise_all(mean,na.rm=T)

Output:

# A tibble: 1 x 3
     CA    TX    WI
  <dbl> <dbl> <dbl>
1 0.444 0.278 0.278

In the case of year, month and day, you can extract the value from your date variable, then use group_by() and obtain the summary. I will only do for year as it is easy to extend for month and day:

#Code 3 only year avg sales
AvgSalesYear <- WalData %>% mutate(Year=format(date,'%Y')) %>%
  select(c(CA,TX,WI,Year)) %>%
  group_by(Year) %>%
  summarise_all(mean,na.rm=T)

Output:

# A tibble: 1 x 4
  Year      CA    TX    WI
  <chr>  <dbl> <dbl> <dbl>
1 2011  12642. 7890. 8209.

Same logic for percentages at year level:

#Code 4 only year avg sales percentage
AvgSalesPercYear <- WalData %>% mutate(Year=format(date,'%Y')) %>%
  select(c(CA,TX,WI,Year,Total)) %>%
  rowwise() %>% mutate(across(CA:WI,~./Total)) %>%
  select(-Total) %>%
  group_by(Year) %>%
  summarise_all(mean,na.rm=T)

Output:

# A tibble: 1 x 4
  Year     CA    TX    WI
  <chr> <dbl> <dbl> <dbl>
1 2011  0.444 0.278 0.278
0
votes

We can use data.table

library(data.table)
melt(setDT(WalData), measure = c("CA", "TX", "WI"))[, perc := value/Total][]