2
votes

I have the following data:

ID        AGE SEX   RACE    COUNTRY VISITNUM    VSDTC   VSTESTCD    VSORRES
32320058    58  M   WHITE   UKRAINE 2   2016-04-28       DIABP          74
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       HEIGHT        183
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       SYSBP         116
32320058    58  M   WHITE   UKRAINE 2   2016-04-28       SYSBP         116
32320058    58  M   WHITE   UKRAINE 1   2016-04-21       WEIGHT        109
22080090    75  M   WHITE   MEXICO  1   2016-05-17       DIABP          81
22080090    75  M   WHITE   MEXICO  1   2016-05-17       HEIGHT        176
22080090    75  M   WHITE   MEXICO  1   2016-05-17       SYSBP         151

I would like to reshape the data using tidyr::spread to get the following output:

ID AGE SEX  RACE    COUNTRY VISITNUM    VSDTC    DIABP SYSBP WEIGHT HEIGHT
32320058    58  M   WHITE   UKRAINE 2   2016-04-28   74   116   NA   NA
32320058    58  M   WHITE   UKRAINE 1   2016-04-21   NA   116   109   183
22080090    75  M   WHITE   MEXICO  1   2016-05-17   81   151   NA   176

I receive duplicate errors, although I don't have duplicates in my data!

df1=spread(df,VSTESTCD,VSORRES)

Error: Duplicate identifiers for rows (36282, 36283), (59176, 59177), (59179, 59180)

1
give us your dput output and your spread codeAnanta
@user9594 Can you please share the actual errors you receive and the code you run to produce the errors mentioned?Technophobe01
I updated the question to include command and error. TIAuser9594
@TBSRounder. Like I explained in the output I am looking for, just handle them as different visitsuser9594
Your example dataset spreads fine for me using tidyr_0.4.1aosmith

1 Answers

0
votes

I assume that I understand your question

# As many rows are identical, we should create a unique identifier column

# Let's take iris dataset as an example

# install caret package if you don't have it

install.packages("caret")

# require library
library(tidyverse)
library(caret)

# check the dataset (iris)
head(iris)

# assume that I gather all columns in iris dataset, except Species variable

# Create an unique identifier column and transform wide data to long data as follow

iris_gather<- iris %>% dplyr::mutate(ID=row_number(Species)) %>% tidyr::gather(key=Type,value=my_value,1:4)

# check first six rows

head(iris_gather)
# using *spread* to spread out the data

iris_spread<- iris_gather %>% dplyr::group_by(ID) %>% tidyr::spread(key=Type,value=my_value) %>% dplyr::ungroup() %>% dplyr::select(-ID)

# Check first six rows of iris_spread

head(iris_spread)