I am learning some R and the use of it on the data analysis so sorry for my probably very dumb questions...
I am analyzing a data frame with a lot of data related to protein presence on a group of cell lines from AMD Anderson database. So I have in a datatable as rows the cell lines and on the cols the protein with thir data ("AMDDatabase"). I need to cross these data using a correlation but when I am onto it it gives me an error "Evaluation error: not enough finite observations"
actividad_protein_long <- gather(data = AMD_database, key = protein, value = level, -(1:5))
correlation_table <- na.omit(actividad_protein_long) %>%
group_by(protein) %>%
summarise(r = cor.test(rel_IC50_uM, level, method = "Kendall")$estimate,
p_value = cor.test(rel_IC50_uM, level, method = "Kendall")$p.value)
I understand that the problem is related to the number of data on the cols, this number vary a lot and I can see some of them are under the threshold of three data point per protein so the analysis cannot be completed.
How can I filter the data previously so I can remove all the data where the observations are under the three required to perform the analysis?
I tried
filteredData <- AMD_database[which(,colSums(!is.na(AMD_database)))>3]
filteredData <- AMD_database[which(AMD_database[,colSums(AMD_database)]>3)]
But it doesn't end well. How can I make all of the columns in AMD_database contain enough non-NA values before use correlation? is there any workaround better than doing this? Is correct to bypass this warning like that or I am incurring in a big mistake to avoid it?
And already check: cor.test ,"not enough finite observations" How to ignore cor.test:“not enough finite observations” and continue, when using tidyverse and ggplot2 (ggpmisc) R cor.test : "not enough finite observations"
But I cannot use "Purrl" package because this is intended to be shared on a server that cannot have that package available and I am not sure I wnat to bypass the error like the other post tells...
Many thanks in advance :)