Data handling: 2 independent factors, which decide the position of a numeric value in a new data frame

Question

I am new to Stackoverflow and to R, so I hope you can be a bit patient and excuse any formatting mistakes.

I am trying to write an R-script, which allows me to automatically analyze the raw data of a qPCR machine.

I was quite successful in cleaning up the data, but at some point I run into trouble. My goal is to consolidate the data into a comprehensive table.

The initial data frame (DF) looks something like this:

Sample Detector Value
1      A        1
1      B        2
2      A        3
3      A        2
3      B        3
3      C        1

My goal is to have a dataframe with the Sample-names as row names and Detector as column names.

  A  B  C
1 1  2  NA
2 3  NA NA
3 2  3  1

My approach

First I took out the names of samples and detectors and saved them in vectors as factors.

detectors = summary(DF$Detector)
detectors = names(detectors)

samples = summary(DF$Sample)
samples = names(samples)

result = data.frame(matrix(NA, nrow = length(samples), ncol = length(detectors)))
colnames(result) = detectors
rownames(result) = samples

Then I subsetted the detectors into a new dataframe based on the name of the detector in the dataframe.

for (i in 1:length(detectors)){
  assign(detectors[i], DF[which(DF$Detector == detectors[i]),])
}

Then I initialize an empty dataframe with the right column and row names:

result = data.frame(matrix(NA, nrow = length(samples), ncol = length(detectors)))
colnames(result) = detectors
rownames(result) = samples

So now the Problem. I have to get the values from the detector subsets into the result dataframe. Here it is important that each values finds the way to the right position in the dataframe. The issue is that there are not equally many values since some samples lack some detectors.

I tried to do the following: Iterate through the detector subsets, compare the rowname (=samplename) with each other and if it's the same write the value into the new dataframe. In case it it is not the same, it should write an NA.

for (i in 1:length(detectors)){
  for (j in 1:length(get(detectors[i])$Sample)){
    result[j,i] = ifelse(get(detectors[i])$Sample[j] == rownames(result[j,]), get(detectors[i])$Ct.Mean[j], NA) 
  }
}

The trouble is, that this stops the iteration through the detector$Sample column and it switches to the next detector. My understanding is that the comparing samples get out of sync, yielding the all following ifelse yield a NA.

I tried to circumvent it somehow by editing the ifelse(test, yes, no) NO with j=j+1 to get it back in sync, but this unfortunately didn't work.

I hope I could make my problem understandable to you!

Looking forward to hear any suggestions, or comments (also how to general improve my code ;)

akrun akrun · Accepted Answer · 2015-10-01T15:50:38

We can use acast from library(reshape2) to convert from 'long' to 'wide' format.

acast(DF, Sample~Detector, value.var='Value') #returns a matrix output
#  A  B  C
#1 1  2 NA
#2 3 NA NA
#3 2  3  1

If we need a data.frame output, use dcast.

Or use spread from library(tidyr), which will also have the 'Sample' as an additional column.

library(tidyr)
spread(DF, Detector, Value)

Data handling: 2 independent factors, which decide the position of a numeric value in a new data frame

1 Answers