create unique values in each cells for a given range in excel or R

Question

Apologies if the example is not formatted properly.

I have a data set with one sample per row, the data contain two columns with reference numbers of the start value and end value.

cell A1 = Sample #1
cell B1 = 101-263 (start value)
cell C1 = 101-266 (end value)
cell A2 = Sample #2
cell B2 = 162-186 (start value)
cell C2 = 162-187 (end value)

The range of values is a different length of each row of data, with a maximum range of 8 values. I need to fill in the values in the range, with each value in a cell along the row.

So for sample #1 above I need to create the cell values: D1 = 101-264, and E1 = 101-265

While for sample #2 there will be no extra cells needed.

Is there a formula (using Vlookup and If perhaps?) that I can create and drag across all rows and over the 8 needed columns to fill in this data? (I don't mind if there are N/A in the shorter-range rows)

If there is an easier way using R also fine with me.

Thanks for any advice

Michael Dorman Michael Dorman · Accepted Answer · 2017-05-11T14:51:36

The following code:

library(magrittr)
library(plyr)
library(reshape2)

# Create input example
dat = data.frame(
  sample = c("Sample #1", "Sample #2"), 
  start = c("101-263", "162-186"), 
  end = c("101-266", "162-187"),
  stringsAsFactors = FALSE
  )

# Extract 'start' and 'end' values
dat$num1 = dat$start %>% strsplit("-") %>% sapply("[", 1)
dat$start2 = dat$start %>% strsplit("-") %>% sapply("[", 2) %>% as.numeric
dat$end2 = dat$end %>% strsplit("-") %>% sapply("[", 2) %>% as.numeric
dat$start = NULL
dat$end = NULL

# For each row
for(i in 1:nrow(dat)) {

  # Check if there is any need to add entries
  if((dat$end2[i] - dat$start2[i]) > 1) {

    # For each entry
    for(j in seq(dat$start2[i], dat$end2[i] -1)) {

      # Create entry
      new_entry = data.frame(
        sample = dat$sample[i],
        num1 = dat$num1[i],
        start2 = dat$start2[i],
        end2 = j,
        stringsAsFactors = FALSE
      )

      # Add to table
      dat = rbind(dat, new_entry)

    }

  }

}

# Calculate all values
dat$value = paste0(dat$num1, "-", dat$end2)
dat = dat[, c("sample", "value")]

# Create column labels
dat = ddply(
  dat,
  "sample",
  transform,
  var = paste0("col", rank(value))
)

# Reshape to required format
dat = dcast(dat, sample ~ var, value.var = "value")

Does what you asked on the provided example.

It transforms this table -

     sample   start     end
1 Sample #1 101-263 101-266
2 Sample #2 162-186 162-187

Into this one -

     sample    col1    col2    col3    col4
1 Sample #1 101-263 101-264 101-265 101-266
2 Sample #2 162-187    <NA>    <NA>    <NA>

If there is a larger example for testing will be happy to do so :)

create unique values in each cells for a given range in excel or R

2 Answers