5
votes

I have two datasets both of the same size [132,450000]. One with values and another with p-values corresponding to those values. Now I want to combine those two datasets so that I have 1 large dataframe [264,450000] with the column with values followed by the column with the corresponding p-values. The rownames are exactly the same and the column names are like: sample1 in df1 and sample1_pval in df2

For example I have two dataframes likes this

> df1
    x y
cg1 1 a
cg2 2 b
cg3 3 c
cg4 4 d
cg5 5 e

> df2
     x_pval y_pval 
cg1   6      f
cg2   7      g
cg3   8      h
cg4   9      i
cg5  10      j

And I want to merge them with this order: 1st column of df1 followed by 1st column of df2 followed by 2nd column of df1 followed by 2nd column of df2 etc etc.....

So then it will look like this:

> df
           x       x_pval    y        y_pval
cg1        1        6        a        f
cg2        2        7        b        g
cg3        3        8        c        h
cg4        4        9        d        i
cg5        5       10        e        j

I want to keep the columnnames but the rownames I can add later since they are the same in both dataframes. Since I'm working with a large dataset I dont want to type in all the columns and use "cbind". And I couldn't find a code for "merge" that does one column from each dataset at a time......

Is there a formula or package that does this?

Anyone who can help me out?

4
Why the column order matter? - zx8754
I have to submit my data somewhere in this format and order and I want to export an csv file in the correct order ready for submitting. - Fleur Peters

4 Answers

3
votes

Another option would be to concatenate the sequence of columns of both datasets, order and then cbind

cbind(df1, df2)[order(c(seq_along(df1), seq_along(df2)))]
#    x x_pval y y_pval
#cg1 1      6 a      f
#cg2 2      7 b      g
#cg3 3      8 c      h
#cg4 4      9 d      i
#cg5 5     10 e      j
1
votes

One idea is to cbind the data frames and order on column names prefixes, i.e.

dd <- cbind(df1, df2)
dd[order(sub('_.*', '', names(dd)))]

which gives,

    x x_pval y y_pval
cg1 1      6 a      f
cg2 2      7 b      g
cg3 3      8 c      h
cg4 4      9 d      i
cg5 5     10 e      j

If your columns are always structured as your example then this will also work,

data.frame(dd[c(TRUE, FALSE)], dd[c(FALSE, TRUE)]) #dd taken from above
1
votes

Or if you want to avoid sorting:

cbind(df1, df2)[rep(seq_along(df1),each=2)+rep(c(0,ncol(df1)),ncol(df1))]
1
votes

You can reorder th columns using %%, to give you the odd/even column numbers.

EDIT.
This idea turned out to be bad, the original code was buggy beyond repair. After I saw the solution by @Sotos using order and his comment on the use of c(TRUE, FALSE) I decided to post a solution that works, inspired but not equal to that one. Here it is.

altern <- function(m, n){
    order(c(which(rep(c(TRUE, FALSE), m)), which(rep(c(FALSE, TRUE), n))))
}

df3 <- cbind(df1, df2)
n <- seq_along(names(df3))
df3 <- df3[, n[altern(ncol(df1), ncol(df2))]]
df3

As you can see, this solution is the most complicated of all.