8
votes

So I've been trying to use left_join to get the columns of a new dataset onto my main dataset (called employee)

I've double checked the vector names and the cleaning that I've don't and nothing seems to work. Here is my code. Would appreciate any help.

job_codes <- read_csv("Quest_UMMS_JobCodes.csv")
job_codes <- job_codes %>%
clean_names() %>%
select(job_code, pos_desc = pos_des_desc)

job_codes$is_nurse <- str_detect(tolower(job_codes$pos_desc), "nurse") 

employee <- employee %>%
left_join(job_codes, by = "job_code")

The error I keep getting:Error in eval(substitute(expr), envir, enclos) : 'job_code' column not found in rhs, cannot join

here are the results of

names(job_code)
> names(job_codes)
[1] "job_code" "pos_desc" "is_nurse"

names(employee)
> names(employee)
 [1] "REC_NUM"             "ZIP"                 "STATE"              
 [4] "SEX"                 "EEO_CLASS"           "BIRTH_YEAR"         
 [7] "EMP_STATUS"          "PROCESS_LEVEL"       "DEPARTMENT"         
 [10] "JOB_CODE"            "UNION_CODE"          "SUPERVISOR"         
 [13] "DATE_HIRED"          "R_SHIFT"             "SALARY_CLASS"       
 [16] "EXEMPT_EMP"          "PAY_RATE"            "ADJ_HIRE_DATE"      
 [19] "ANNIVERS_DATE"       "TERM_DATE"           "NBR_FTE"            
 [22] "PENSION_PLAN"        "PAY_GRADE"           "SCHEDULE"           
 [25] "OT_PLAN_CODE"        "DECEASED"            "POSITION"           
 [28] "WORK_SCHED"          "SUPERVISOR_IND"      "FTE_TOTAL"          
 [31] "PRO_RATE_TOTAL"      "PRO_RATE_A_SAL"      "NEW_HIRE_DATE"      
 [34] "COUNTY"              "FST_DAY_WORKED"      "date_hired"         
 [37] "date_hired_adj"      "term_date"           "employment_duration"
 [40] "current"             "age"                 "emp_duration_years" 
 [43] "DESCRIPTION.x"       "PAY_STATUS.x"        "DESCRIPTION.y"      
 [46] "PAY_STATUS.y"      
1
Try left_join(., job_codes, by = "job_code") for your last line? - ulfelder
The 2nd line of your code snippet reads ob_codes <- job_codes %>%. Isn't there a j missing in front? - Uwe
Please, edit your Q and add the result of names(job_codes) and names(employee). - Uwe
Oh wait. Thanks @UweBlock I figured it out - Randa Melhem
@RandaMelhem If UweBlock's answer answered your question, the proper way to thank them is to accept their answer. - Calimo

1 Answers

14
votes

Now, after the OP has added the column names of both tables in the Q, it is evident that the column to join on is written in different ways (upper vs lower case).

If the column names are different, help("left_join") suggests:

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

So, in this case it should read

employee <- employee %>% left_join(job_codes, by = c("JOB_CODE" = "job_code"))