0
votes

I have 2 dataframe sharing the same rows IDs but with different columns

Here is an example

  chrom     coord               sID      CM0016      CM0017    CM0018
7     10   3178881 SP_SA036,SP_SA040 0.000000000 0.000000000 0.0009923
8     10  38894616 SP_SA036,SP_SA040 0.000434783 0.000467464 0.0000970
9     11 104972190 SP_SA036,SP_SA040 0.497802888 0.529319536 0.5479003

and

   chrom     coord            sID      CM0001      CM0002      CM0003
4     10   3178881 SP_SA036,SA040 0.526806527 0.544927536 0.565610860
5     10  38894616 SP_SA036,SA040 0.009049774 0.002849003 0.002857143
6     11 104972190 SP_SA036,SA040 0.451612903 0.401617251 0.435318275

I am trying to create a composite boxplot figure where I have in x axis the chrom and coord combined (so 3 points) and for each x value 2 boxplots side by side corresponding to the two dataframes ?

What is the best way of doing this ? Should I merge the two dataframes together somehow in order to get only one and loop over the boxplots rendering by 3 columns ?

Any idea on how this can be done ?

The problem is that the two dataframes have the same number of rows but can differ in number of columns

>  dim(A)
[1] 99 20
>  dim(B)
[1] 99 28

I was thinking about transposing the dataframe in order to get the same number of column but got lost on how to this properly Thanks in advance

UPDATE

This is what I tried to do

  • I merged chrom and coord columns together to create a single ID
  • I used reshape t melt the dataframes
  • I merged the 2 melted dataframe into a single one
  • the head looks like this
  • I have two variable A2 and A4 corresponding to the 2 dataframes
  • then I created a boxplot such using this

    ggplot(A2A4, aes(factor(combine), value)) +geom_boxplot(aes(fill = factor(variable)))

I think it solved my problem but the boxplot looks very busy with 99 x values with 2 boxplots each

1
A box plot based on which column (for y) in each data frame? - joran
how do the "CM0016, CM0017, CM0018" names relate to the "CM0001, CM0002, CM0003" names? - MrFlick
they are different cases they dont relate to each other (these are two independant experiments) - Rad
@joran yes kind of, the 2 dataframes share the same row ids, (x axis) for each x value I am trying to get 2 boxplots coming from the 2 dataframes (boxplot represent all rows values for each x , example for a given position I am trying to get a boxplot corresponding to (CM0016 CM0017 CM0018) and one corresponding to (CM0001 CM0002 CM0003) - Rad
What you describe suggests an end result (using just the data in the question) of three pairs of box plots, and each box plot would be created using just 3 values. Is that correct? - joran

1 Answers

2
votes

So if these are your input tables

d1<-structure(list(chrom = c(10L, 10L, 11L), 
coord = c(3178881L, 38894616L, 104972190L), 
sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SP_SA040", class = "factor"), 
    CM0016 = c(0, 0.000434783, 0.497802888), CM0017 = c(0, 0.000467464, 
    0.529319536), CM0018 = c(0.0009923, 9.7e-05, 0.5479003)), .Names = c("chrom", 
"coord", "sID", "CM0016", "CM0017", "CM0018"), class = "data.frame", row.names = c("7", 
"8", "9"))

d2<-structure(list(chrom = c(10L, 10L, 11L), coord = c(3178881L, 
38894616L, 104972190L), sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SA040", class = "factor"), 
    CM0001 = c(0.526806527, 0.009049774, 0.451612903), CM0002 = c(0.544927536, 
    0.002849003, 0.401617251), CM0003 = c(0.56561086, 0.002857143, 
    0.435318275)), .Names = c("chrom", "coord", "sID", "CM0001", 
"CM0002", "CM0003"), class = "data.frame", row.names = c("4", 
"5", "6"))

Then I would combine and reshape the data to make it easier to plot. Here's what i'd do

m1<-melt(d1, id.vars=c("chrom", "coord", "sID"))
m2<-melt(d2, id.vars=c("chrom", "coord", "sID"))
dd<-rbind(cbind(m1, s="T1"), cbind(m2, s="T2"))
mm$pos<-factor(paste(mm$chrom,mm$coord,sep=":"),
    levels=do.call(paste, c(unique(dd[order(dd[[1]],dd[[2]]),1:2]), sep=":")))

I first melt the two input tables to turn columns into rows. Then I add a column to each table so I know where the data came from and rbind them together. And finally I do a bit of messy work to make a factor out of the chr/coord pairs sorted in the correct order.

With all that done, I'll make the plot like

ggplot(mm, aes(x=pos, y=value, color=s)) +
    geom_boxplot(position="dodge")

and it looks like

resulting boxplot