I had a general question regarding most efficient coding as a beginner - I have a very wide dataset (374 obs), on which I have to do several manipulations on. I'll mainly be using 'mutate' and 'unite' . My question is:
How I write the code now is that everytime I do something new (ie if I combine 6 columns into one), then I'll write a separate code for that and create a new dataframe.
Underneath there'll be another code for 'mutate' like if I have to create a new variable by summing two columns.
here's an example:
#1B. Combine location columns.
combinedlocations <- rawdata1 %>% unite(location, locations1,locations2, locations3, na.rm = TRUE,
remove=TRUE)
combinedlocations <- combinedlocations[-c(6:7)] #drop the unwanted columns
#2. Combine Sector together into one new column: Sector
#B. Combine columns, but override if Type.of.org = 'Independent Artist', where Sector = "Independent
Artist"
Combinedsectors <- combinedlocations %>% unite(Sector, Sectors, na.rm=TRUE, remove=TRUE) %>%
I basically create a new dataframe for each manipulation, using the one I just created.
Is this correct? This is how I learned to do it on SAS. OR, is it better to do it all in one dataframe (maybe rawdata2) and is there a way to combine all these codes together, using %>% ? (I'm still trying to learn how piping works)