How to properly implement summarise function via dplyr package using an excel (xlsx) file (R language)?

Question

I am working on a project for work and while we currently use excel to perform data analysis, I'm trying to transition into R as I believe from my past experiences, this will prove far more useful and efficient in the long run. I attached two pictures, one showing the raw data that is being used and one showing the end result I want via Excel's pivot table. Note I changed names due to confidentiality. Edit: I realize the result in Excel does not have equal dimensions as with the summarise function's output, should I be using a different function? Otherwise, I included a third picture that may be more feasible to achieve. Raw Data Preferred result in Excel Second result

Here is the code I have written so far (importing and creating data frame. mutating etc. has been successful so I didn't include it and rd_data1 is the original data imported with no changes):

rd_analysis2 <- rd_data1 %>%

  select('Employee Name', 'Month', 'Customer Name', 'Service Type', 'Number of Points', 'Actual Time Spent(min)', 'Work Order') %>%
    group_by('Employee Name', 'Month', 'Customer Name', 'Service Type') %>%
      summarise(Total_Points = sum('Number of Points', na.rm=TRUE), Total_Time = sum('Actual Time Spent(min)'), distinct_work_orders = n_distinct('Work Order')) %>%
        arrange('Employee Name', 'Month')

I either get error messages with the summarise function saying "Error: Problem with summarise() input Total_Points. x invalid 'type' (character) of argument" or get a weird result that combines all the points and time, but not grouped as I would like. Any pointers or tips would be greatly appreciated as I'm a little rusty so please forgive me if I'm off in my syntax.

I can't recreate as raw data in in image format. However, typically if you get this type of error is due to calling a function that requires a number as input. So if your 'Total_points' was of character class this would happen. You may need to change class of the column. — Dasr
Is it possible for me to upload the raw data on here? That's what I thought too, but when I did summary(rd_data1), it show's Number of Points column as a numeric imput and the same applies for time. Note "Total_Points" is simple a new variable I created not an input from the original source, I don't think it matters what the name of the variable is. — DJDelman

alex_jwb90 alex_jwb90 · Accepted Answer · 2020-07-08T22:24:15

I think you're merely using the wrong quotation for your column names. Try using a backtick (`) instead.

rd_analysis2 <- rd_data1 %>%
  select(`Employee Name`, `Month`, `Customer Name`, `Service Type`, `Number of Points`, `Actual Time Spent(min)`, `Work Order`) %>%
  group_by(`Employee Name`, `Month`, `Customer Name`, `Service Type`) %>%
  summarise(
    Total_Points = sum(`Number of Points`, na.rm=TRUE),
    Total_Time = sum(`Actual Time Spent(min)`),
    distinct_work_orders = n_distinct(`Work Order`),

    .groups = 'drop'
  ) %>%
  arrange(`Employee Name`, `Month`)

How to properly implement summarise function via dplyr package using an excel (xlsx) file (R language)?

2 Answers