0
votes

I am working on a project for work and while we currently use excel to perform data analysis, I'm trying to transition into R as I believe from my past experiences, this will prove far more useful and efficient in the long run. I attached two pictures, one showing the raw data that is being used and one showing the end result I want via Excel's pivot table. Note I changed names due to confidentiality. Edit: I realize the result in Excel does not have equal dimensions as with the summarise function's output, should I be using a different function? Otherwise, I included a third picture that may be more feasible to achieve. Raw Data Preferred result in Excel Second result

Here is the code I have written so far (importing and creating data frame. mutating etc. has been successful so I didn't include it and rd_data1 is the original data imported with no changes):

rd_analysis2 <- rd_data1 %>%

  select('Employee Name', 'Month', 'Customer Name', 'Service Type', 'Number of Points', 'Actual Time Spent(min)', 'Work Order') %>%
    group_by('Employee Name', 'Month', 'Customer Name', 'Service Type') %>%
      summarise(Total_Points = sum('Number of Points', na.rm=TRUE), Total_Time = sum('Actual Time Spent(min)'), distinct_work_orders = n_distinct('Work Order')) %>%
        arrange('Employee Name', 'Month')

I either get error messages with the summarise function saying "Error: Problem with summarise() input Total_Points. x invalid 'type' (character) of argument" or get a weird result that combines all the points and time, but not grouped as I would like. Any pointers or tips would be greatly appreciated as I'm a little rusty so please forgive me if I'm off in my syntax.

2
I can't recreate as raw data in in image format. However, typically if you get this type of error is due to calling a function that requires a number as input. So if your 'Total_points' was of character class this would happen. You may need to change class of the column.Dasr
Is it possible for me to upload the raw data on here? That's what I thought too, but when I did summary(rd_data1), it show's Number of Points column as a numeric imput and the same applies for time. Note "Total_Points" is simple a new variable I created not an input from the original source, I don't think it matters what the name of the variable is.DJDelman

2 Answers

1
votes

I think you're merely using the wrong quotation for your column names. Try using a backtick (`) instead.

rd_analysis2 <- rd_data1 %>%
  select(`Employee Name`, `Month`, `Customer Name`, `Service Type`, `Number of Points`, `Actual Time Spent(min)`, `Work Order`) %>%
  group_by(`Employee Name`, `Month`, `Customer Name`, `Service Type`) %>%
  summarise(
    Total_Points = sum(`Number of Points`, na.rm=TRUE),
    Total_Time = sum(`Actual Time Spent(min)`),
    distinct_work_orders = n_distinct(`Work Order`),

    .groups = 'drop'
  ) %>%
  arrange(`Employee Name`, `Month`)
0
votes

It sounds like your data types may not be numeric. Hence the invalid 'type' (character) part of the error message.

One way to check your data types (in R there are many ways, this R-bloggers post covers several other options) is via summary and glimpse.

Applying these to the in-built starwars dataset:

library(dplyr)
data(starwars)
summary(starwars)

     name               height           mass          hair_color         skin_color
Length:87          Min.   : 66.0   Min.   :  15.00   Length:87          Length:87
Class :character   1st Qu.:167.0   1st Qu.:  55.60   Class :character   Class :character
Mode  :character   Median :180.0   Median :  79.00   Mode  :character   Mode  :character
                   Mean   :174.4   Mean   :  97.31
                   3rd Qu.:191.0   3rd Qu.:  84.50
                   Max.   :264.0   Max.   :1358.00
                   NA's   :6       NA's   :28

glimpse(starwars)

Observations: 87
Variables: 13
$ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa",...
$ height     <int> 172, 167, 96, 202, 150,...
$ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0,...
$ hair_color <chr> "blond", NA, NA, "none", "brown",...
$ skin_color <chr> "fair", "gold", "white, blue", "white",...

This shows a clear difference between the character and numeric data types in the dataset. If a column of your dataset should be numeric but has been stored as text, then you can use as.numeric to convert it.

E.g. Contrast the following results

# make a column incorrectly character
starwars2 = starwars %>% mutate(mass = as.character(mass))
# view incorrect data
glimpse(starwars2)
# view corrected data
glimpse(starwars2 %>% mutate(mass = as.numeric(mass)))