1
votes

I have the following plots made in R: enter image description here

I used the following code to make the plot:

par(mfrow=c(1,2))

rmsd <- read.table(textConnection("
pdb rmsd
1grl_edited.pdb 1.5118
1oel_edited.pdb 1.1758
1ss8_edited.pdb 0.8576
1gr5_edited.pdb 1.8301
1j4z_edited.pdb 0.7892
1kp8.pdb    0.1808
1kpo_edited.pdb 0.7879
1mnf.pdb    1.2371
1xck.pdb    1.6820
2c7e_edited.pdb 5.4446
2cgt_edited.pdb 9.9108
2eu1.pdb    54.1764
2nwc.pdb    1.6026
2yey.pdb    61.4931
"), header=TRUE)

dat <- read.table(textConnection("
pdb      PA      EHSS 
1gr5_edited.pdb 21518.0 29320.0
1grl_edited.pdb 21366.0 28778.0
1j4z_edited.pdb 21713.0 29636.0
1kp8.pdb    21598.0 29423.0
1kpo_edited.pdb 21718.0 29643.0
1mnf.pdb    21287.0 29035.0
1oel_edited.pdb 21377.0 29054.0
1ss8_edited.pdb 21543.0 29459.0
1sx3.pdb    21651.0 29585.0
1xck.pdb    21191.0 28857.0
2c7e_edited.pdb 22930.0 31120.0
2cgt_edited.pdb 22807.0 31058.0
2eu1.pdb    22323.0 30569.0
2nwc.pdb    21338.0 29326.0
2yey.pdb    21032.0 28670.0
"), header=TRUE, row.names=NULL)

d <- dist(rmsd$rmsd, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit, labels=rmsd$pdb)
groups <- cutree(fit, k=3)

rect.hclust(fit, k=3, border="red")

#for (i in dat[1]){for (z in i){ if (z=="1sx3.pdb"){print (z)}}}

den.PA <- density(dat$PA)
plot(den.PA)
for (i in dat$PA){
    lineat = i
    lineheight <- den.PA$y[which.min(abs(den.PA$x - lineat))]
    lines(c(lineat, lineat), c(0, lineheight), col = "red")
}

The left plot shows the cluster for RMSD values and the right plot shows the density plot of "PA's". The density plot contains an extra value because the reference was included in the plot, the reference was not included in the RMSD cluster because obviously it would return a value of 0. The reference file in dat is 1sx3.pdb

The cluster plot has 3 red boxes, how can I colour the boxes differently so, the left box is red, the middle box is green and the right box is blue. I then need this to mirror with the density plot, meaning, the values inside the red box has red lines on the density plot and the values inside the green box has green lines on the density plot and so forth.

Is it also possible yo catch the reference structure and colour it black on the density plot?

2

2 Answers

2
votes

This code will do what you want. You were almost there...just needed a bit of sorting and indexing.

par(mfrow=c(1,2))

d <- dist(rmsd$rmsd, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit, labels=rmsd$pdb)
groups <- cutree(fit, k=3)

cols = c('red', 'green', 'blue')

rect.hclust(fit, k=3, border=cols)

#for (i in dat[1]){for (z in i){ if (z=="1sx3.pdb"){print (z)}}}

cols = cols[sort(unique(groups[fit$order]), index=T)$ix]

den.PA <- density(dat$PA)
plot(den.PA)
for (i in 1:length(dat$PA)){
    lineat = dat$PA[i]
    lineheight <- den.PA$y[which.min(abs(den.PA$x - lineat))]
    col = cols[groups[which(rmsd$pdb == as.character(dat[i, 'pdb']))]]
    lines(c(lineat, lineat), c(0, lineheight), col = col)
}

enter image description here

0
votes

You can pass a vector of colors to border, like such:

t <- rect.hclust(fit, k=3, border=c("red",'green','blue'))

Note that I saved the output from that, it looks like this:

[[1]]
[1] 12 14

[[2]]
 [1]  1  2  3  4  5  6  7  8  9 13

[[3]]
[1] 10 11

Then, change up your loop a little to this

for (i in 1:length(dat$PA)){
    lineat = dat$PA[i]
    lineheight <- den.PA$y[which.min(abs(den.PA$x - lineat))]
    if(i %in% t[[1]]) lines(c(lineat, lineat), c(0, lineheight), col = "red")
    if(i %in% t[[2]]) lines(c(lineat, lineat), c(0, lineheight), col = "green")
    if(i %in% t[[3]]) lines(c(lineat, lineat), c(0, lineheight), col = "blue")
}

Although that last bit of code isn't too elegant; I'm sure someone can come up with a better solution.