1
votes

I am using mutate to create a new column in dataframe A by fetching values from a dataframe B.

I already tried using the below code but it started throwing an error. Not sure whether I am making any mistakes here. Please find the code below.

Apologies, that I can't share data as it is confidential. However the objective is simple and am sure I am making a blunder somewhere. Can you correct me?

Here, dfm is the dataframe which was already created, from which I will use the 'Code' column values for new dataframe(DF) which was also created but without the 'Visit_concept_id' column.

dfm - Old dataframe with 'Code' column of length 256612
DF %>%
   mutate(visit_concept_id = as.integer(as.character(dfm$Code)))

I expect the 'Code' column values from dfm dataframe to be populated in visit_concept_id column of new dataframe(DF) but I am receiving error as shown below. Please help. Not sure how it was working earlier and not now.

Error: Column visit_concept_id must be length 16 (the group size) or one, not 256612

str(df) is given below

Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 256612 obs. of 17 variables: $ visit_occurrence_id : int 1 2 3 4 5 6 7 8 9 10 ... $ person_id : int 127 12097 348097 4324235 214214 4213 41412 634643 241245 3255 ... $ visit_concept_id : int 32036 32036 32036 32036 32036 32036 32036 32036 32036 32036 ... $ visit_start_date : Date, format: "2169-06-08" "2169-06-08" "2169-06-08" ... $ visit_start_datetime : POSIXct, format: "2169-06-08 09:40:00" "2169-06-08 09:41:00" "2169-06-08 09:42:00" ... $ visit_end_date : Date, format: "2169-06-08" "2169-06-08" "2169-06-08" ... $ visit_end_datetime : POSIXct, format: "2169-06-08 09:40:00" "2169-06-08 09:41:00" "2169-06-08 09:42:00" ... $ visit_type_concept_id : int 44818518 44818518 44818518 44818518 44818518 44818518 44818518 44818518 44818518 44818518 ... $ provider_id : int 0 0 0 0 0 0 0 0 0 0 ... $ care_site_id : int 0 0 0 0 0 0 0 0 0 0 ... $ visit_source_value : chr "Outpatient" "Outpatient" "Outpatient" "Outpatient" ... $ visit_source_concept_id : int 0 0 0 0 0 0 0 0 0 0 ... $ admitting_source_concept_id : int 0 0 0 0 0 0 0 0 0 0 ... $ admitting_source_value : chr NA NA NA NA ... $ discharge_to_concept_id : int 0 0 0 0 0 0 0 0 0 0 ... $ discharge_to_source_value : chr NA NA NA NA ... $ preceding_visit_occurrence_id: int 0 1 2 3 4 0 6 7 8 9 ...

1
What type of field is the Code field? Is it a number that is coded as an integer? Does this work? DF <- dtm %>% mutate(visit_concept_id = as.integer(as.character(Code)))william3031
Code field has 4 distinct values, for example, 1234, 4567, 2345, 6543. Hence it was a factor datatype. In my new column, I wanted it to be integer and so I converted itThe Great
Sorry, for the code above, I meant (DF <- dtm %>% mutate(visit_concept_id = as.integer((Code))) - without the .as.character. I don't know if it will make any difference though.william3031
Another thing is, is there a way to assign value to a new column as shown above but retain the order. I mean Code column when assigned to visit_concept_id, the values are shuffled. Is there a way to prevent that? Other than using merge/join, is there anyway that I can do this elegantly through mutate?The Great
You can arrange but it is hard to tell without seeing your dataset. Perhaps it is better to keept it as a factor. Can you provide the first few lines of the output of str(dtm) (deleting anything confidential)? This will give me an idea of the first problem, not necessarily the second problem.william3031

1 Answers

1
votes

According to the error message your data is grouped. Try this

DF %>% ungroup() %>%
   mutate(visit_concept_id = as.integer(as.character(dfm$Code)))