0
votes

I am working with ggplot to plot bivariate data in groups along with standard ellipses of these data using a separate set of tools. These return n=100 x,y coordinates that define each ellipse, and then for each group, I would like to plot about 10-25 ellipses.

Conceptually, how can this be achieved? I can plot a single ellipse easily using geom_polygon, but I am confused how to get the data organized to make it work so multiple ellipses are plotted and guides (color, fills, linetypes, etc.) are applied per group.

In the traditional R plotting, I could just keep adding lines using a for loop.

Thanks!

UPDATE: Here is a CSV containing 100 coordinates for a single ellipse.

Data

Let's say I have three groups of bivariate data to which the ellipse fitting has been applied: Green, Red, Blue. For each group, I'd like to plot several ellipses.

I don't know how I would organize the data in such a way to work in the long format prefered by ggplot and preserve the group affiliations. Would a list work?

UPDATE2:

Here is a csv of raw x and y data organized into two groups: river and lake

Data

The data plot like this:

test.data <- read.csv("ellipse_test_data.csv")
ggplot(test.data) +
  geom_point(aes(x, y, color = group)) +
  theme_classic()

enter image description here

I am using a package called SIBER, which will fit Bayesian ellipses to the data for comparing groups by ellipse area, etc. The output of the following creates a list with number of elements = number of groups of data, and each element contains a 6 x n (n=number of draws) for each fitted ellipse - first four columns are a covariance matrix Sigma in vector format and the last two are the bivariate means:

# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^5   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 100     # thin the posterior by this many
parms$n.chains <- 2        # run this many chains

# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3

# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the 
# means. Fitting is via the JAGS method.
ellipses.test <- siberMVN(siber.test, parms, priors)

First few rows of the first element in the list:

$`1.river`
     Sigma2[1,1]   Sigma2[2,1]   Sigma2[1,2] Sigma2[2,2]     mu[1]    mu[2]
[1,]   1.2882740  2.407070e-01  2.407070e-01    1.922637 -15.52846 12.14774
[2,]   1.0677979 -3.997169e-02 -3.997169e-02    2.448872 -15.49182 12.37709
[3,]   1.1440816  7.257331e-01  7.257331e-01    4.040416 -15.30151 12.14947

I would like to be able to extract a random number of these ellipses and plot them with ggplot using alpha transparency.

The package SIBER has a function (addEllipse) to convert the '6 x n' entries to a set number of x and y points that define an ellipse, but I don't know how to organize that output for ggplot. I thought there might be an elegant way to do with all internally with ggplot.

The ideal output would be something like this, but in ggplot so the ellipses could match the aesthetics of the levels of data: enter image description here

1
If you could include a sample of the data it will be easier to reproduce the example. See How to make a great R reproducible exampleaustensen
Hi, sorry. I would, but the data example would be quite large, and several thousands lines long.H Ludwig
You don't need to include all your data, just a small sample so that you can demonstrate your issueSymbolixAU
Okay, I included an example polygon defined by 100 coordinates. Thanks!H Ludwig
With the data supplied, I was able to draw an ellipse using geom_polygon() but that was easy and isn't the core of your question as you mentioned that already. The interesting part is how do your bivariate data look like and how are they related to one or many ellipses. A sample of these data would be very helpful.Uwe

1 Answers

1
votes

some code to do this on the bundled demo dataset from SIBER.

In this example we try to create some plots of the multiple samples of the posterior ellipses using ggplot2.

library(SIBER)
library(ggplot2)
library(dplyr)
library(ellipse)

Fit a basic SIBER model to the example data bundled with the package.

# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)

# Calculate summary statistics for each group: TA, SEA and SEAc
group.ML <- groupMetricsML(siber.example)

# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4   # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10     # thin the posterior by this many
parms$n.chains <- 2        # run this many chains

# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3

# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the 
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)

# The posterior estimates of the ellipses for each group can be used to
# calculate the SEA.B for each group.
SEA.B <- siberEllipses(ellipses.posterior)

siberDensityPlot(SEA.B, xticklabels = colnames(group.ML), 
                xlab = c("Community | Group"),
                ylab = expression("Standard Ellipse Area " ('\u2030' ^2) ),
                bty = "L",
                las = 1,
                main = "SIBER ellipses on each group"
                )

Now we want to create some plots of some sample ellipses from these distributions. We need to create a data.frame object of all the ellipses for each group. In this exmaple we simply take the frist 10 posterior draws assuming them to be independent of one another, but you could take a random sample if you prefer.

# how many of the posterior draws do you want?
n.posts <- 10

# decide how big an ellipse you want to draw
p.ell <- 0.95

# for a standard ellipse use
# p.ell <- pchisq(1,2)




# a list to store the results
all_ellipses <- list()

# loop over groups
for (i in 1:length(ellipses.posterior)){

  # a dummy variable to build in the loop
  ell <- NULL
  post.id <- NULL

  for ( j in 1:n.posts){

    # covariance matrix
    Sigma  <- matrix(ellipses.posterior[[i]][j,1:4], 2, 2)

    # mean
    mu     <- ellipses.posterior[[i]][j,5:6]

    # ellipse points

    out <- ellipse::ellipse(Sigma, centre = mu , level = p.ell)


    ell <- rbind(ell, out)
    post.id <- c(post.id, rep(j, nrow(out)))

  }
  ell <- as.data.frame(ell)
  ell$rep <- post.id
  all_ellipses[[i]] <- ell
}

ellipse_df <- bind_rows(all_ellipses, .id = "id")


# now we need the group and community names

# extract them from the ellipses.posterior list
group_comm_names <- names(ellipses.posterior)[as.numeric(ellipse_df$id)]

# split them and conver to a matrix, NB byrow = T
split_group_comm <- matrix(unlist(strsplit(group_comm_names, "[.]")),
                           nrow(ellipse_df), 2, byrow = TRUE)

ellipse_df$community <- split_group_comm[,1]
ellipse_df$group     <- split_group_comm[,2]

ellipse_df <- dplyr::rename(ellipse_df, iso1 = x, iso2 = y)

Now to create the plots. First plot all the raw data as we want.

first.plot <- ggplot(data = demo.siber.data, aes(iso1, iso2)) +
  geom_point(aes(color = factor(group):factor(community)), size = 2)+
  ylab(expression(paste(delta^{15}, "N (\u2030)")))+
  xlab(expression(paste(delta^{13}, "C (\u2030)"))) + 
  theme(text = element_text(size=15))
print(first.plot)

Now we can try to add the posterior ellipses on top and facet by group

second.plot <- first.plot + facet_wrap(~factor(group):factor(community))
print(second.plot)

# rename columns of ellipse_df to match the aesthetics

third.plot <- second.plot + 
  geom_polygon(data = ellipse_df,
              mapping = aes(iso1, iso2,
                             group = rep,
                             color = factor(group):factor(community),
                             fill = NULL),
               fill = NA,
               alpha = 0.2)
print(third.plot)

Facet-wrapped plot of sample of posterior ellipses by group