I'm currently in a bit of a rut when it comes to R coding. I have been trying to use mutate, seq, and rep functions to generate a new column that iterates over multiple column values and different conditionals, but it has not come out correct. I have a few snippet of my data below:
library(tidyverse)
library(data.table)
library(stringr)
lipidData <- data.frame("Type"=c(rep("LDL",5),rep("HDL",5)),
"featureID"=c(12,12,12,12,13,13,14,15,16,17),
"featureID2"=c(21,22,23,26,31,31,31,31,38,40))
lipidWrong <- lipidData %>%
group_by(Type,featureID) %>%
group_by(Type,featureID2) %>%
mutate(lipidName=paste0(rep("lipid",n()),"_",seq(1,n())))
lipidWrong
Type featureID featureID2 lipidName
<fct> <dbl> <dbl> <chr>
1 LDL 12 21 lipid_1
2 LDL 12 22 lipid_1
3 LDL 12 23 lipid_1
4 LDL 12 26 lipid_1
5 LDL 13 31 lipid_1
6 HDL 13 31 lipid_1
7 HDL 14 31 lipid_2
8 HDL 15 31 lipid_3
9 HDL 16 38 lipid_1
10 HDL 17 40 lipid_1
Instead of that incorrect data table, I would like to have the lipidName be grouped by Type and featureID and then looking at Type feature ID2. If they have the same type and featureID, then count them as the same lipid for lipidName. If they have the same type and featureID2, then count them as the same lipid for lipidName. Since my real dataset includes >100,000 lines, it would be great to know how to sequence the numbers over the entire dataset and not just the n() results from group_by.
I would like to see my results as:
lipidCorrect
Type featureID featureID2 lipidName
1 LDL 12 21 lipid_1 # same type and featureID
2 LDL 12 22 lipid_1 # same type and featureID
3 LDL 12 23 lipid_1 # same type and featureID
4 LDL 12 26 lipid_1 # same type and featureID
5 LDL 13 31 lipid_2 # although featureID is the same with row6, it has a different type
6 HDL 13 31 lipid_3 # same type and featureID2
7 HDL 14 31 lipid_3 # same type and featureID2
8 HDL 15 31 lipid_3 # same type and featureID2
9 HDL 16 38 lipid_4
10 HDL 17 40 lipid_5
Please let me know if I'm doing anything wrong with my group_by() and mutate(), and also please let me know of a better way to produce the desired results.
Thanks!
lipidNameif they (a) have the same type AND (b) either have the samefeatureIDor the samefeatureID2. Is that correct? - Gregor Thomasgroup_by()will override your the first grouping. - TTSfeatureIDas "fID"), X hasfID = 30, fID2 = 50, Y hasfID = 31, fID2 = 50, Z hasfID = 31, fID2 = 51, do they all have the same lipid name even though X is only connected to Z via Y? - Gregor ThomaslipidNamesto all groups with more than 1 row, with all rows within each group getting the same name, and the names iterate the number after the_between groups. THEN group by type and fID2 and repeat the process only for those rows that don't already havelipidNames. Does this sound right? - Gregor Thomas