0
votes

I have 2 dataframe sharing the same rows IDs but with different columns

Here is an example

  chrom     coord               sID      CM0016      CM0017    CM0018
7     10   3178881 SP_SA036,SP_SA040 0.000000000 0.000000000 0.0009923
8     10  38894616 SP_SA036,SP_SA040 0.000434783 0.000467464 0.0000970
9     11 104972190 SP_SA036,SP_SA040 0.497802888 0.529319536 0.5479003

and

   chrom     coord            sID      CM0001      CM0002      CM0003
4     10   3178881 SP_SA036,SA040 0.526806527 0.544927536 0.565610860
5     10  38894616 SP_SA036,SA040 0.009049774 0.002849003 0.002857143
6     11 104972190 SP_SA036,SA040 0.451612903 0.401617251 0.435318275

I am trying to create a composite boxplot figure where I have in x axis the chrom and coord combined (so 3 points) and for each x value 2 boxplots side by side corresponding to the two dataframes ?

What is the best way of doing this ? Should I merge the two dataframes together somehow in order to get only one and loop over the boxplots rendering by 3 columns ?

Any idea on how this can be done ?

The problem is that the two dataframes have the same number of rows but can differ in number of columns

>  dim(A)
[1] 99 20
>  dim(B)
[1] 99 28

I was thinking about transposing the dataframe in order to get the same number of column but got lost on how to this properly Thanks in advance

UPDATE

This is what I tried to do

  • I merged chrom and coord columns together to create a single ID
  • I used reshape t melt the dataframes
  • I merged the 2 melted dataframe into a single one
  • the head looks like this
  • I have two variable A2 and A4 corresponding to the 2 dataframes
  • then I created a boxplot such using this

    ggplot(A2A4, aes(factor(combine), value)) +geom_boxplot(aes(fill = factor(variable)))

I think it solved my problem but the boxplot looks very busy with 99 x values with 2 boxplots each

1
A box plot based on which column (for y) in each data frame?joran
how do the "CM0016, CM0017, CM0018" names relate to the "CM0001, CM0002, CM0003" names?MrFlick
they are different cases they dont relate to each other (these are two independant experiments)Rad
@joran yes kind of, the 2 dataframes share the same row ids, (x axis) for each x value I am trying to get 2 boxplots coming from the 2 dataframes (boxplot represent all rows values for each x , example for a given position I am trying to get a boxplot corresponding to (CM0016 CM0017 CM0018) and one corresponding to (CM0001 CM0002 CM0003)Rad
What you describe suggests an end result (using just the data in the question) of three pairs of box plots, and each box plot would be created using just 3 values. Is that correct?joran

1 Answers

2
votes

So if these are your input tables

d1<-structure(list(chrom = c(10L, 10L, 11L), 
coord = c(3178881L, 38894616L, 104972190L), 
sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SP_SA040", class = "factor"), 
    CM0016 = c(0, 0.000434783, 0.497802888), CM0017 = c(0, 0.000467464, 
    0.529319536), CM0018 = c(0.0009923, 9.7e-05, 0.5479003)), .Names = c("chrom", 
"coord", "sID", "CM0016", "CM0017", "CM0018"), class = "data.frame", row.names = c("7", 
"8", "9"))

d2<-structure(list(chrom = c(10L, 10L, 11L), coord = c(3178881L, 
38894616L, 104972190L), sID = structure(c(1L, 1L, 1L), .Label = "SP_SA036,SA040", class = "factor"), 
    CM0001 = c(0.526806527, 0.009049774, 0.451612903), CM0002 = c(0.544927536, 
    0.002849003, 0.401617251), CM0003 = c(0.56561086, 0.002857143, 
    0.435318275)), .Names = c("chrom", "coord", "sID", "CM0001", 
"CM0002", "CM0003"), class = "data.frame", row.names = c("4", 
"5", "6"))

Then I would combine and reshape the data to make it easier to plot. Here's what i'd do

m1<-melt(d1, id.vars=c("chrom", "coord", "sID"))
m2<-melt(d2, id.vars=c("chrom", "coord", "sID"))
dd<-rbind(cbind(m1, s="T1"), cbind(m2, s="T2"))
mm$pos<-factor(paste(mm$chrom,mm$coord,sep=":"),
    levels=do.call(paste, c(unique(dd[order(dd[[1]],dd[[2]]),1:2]), sep=":")))

I first melt the two input tables to turn columns into rows. Then I add a column to each table so I know where the data came from and rbind them together. And finally I do a bit of messy work to make a factor out of the chr/coord pairs sorted in the correct order.

With all that done, I'll make the plot like

ggplot(mm, aes(x=pos, y=value, color=s)) +
    geom_boxplot(position="dodge")

and it looks like

resulting boxplot