2
votes

JMP has a "split table" platform:
http://www.jmp.com/support/help/Split_Columns.shtml

Here is the image for it:

enter image description here

The "split by" becomes part of the column headers.
The "split columns" are the columns spread out.
The "group" are retained columns.

I have looked at a few links/pages and can't seem to get this right in R. Right now I have to kluge it into a macro in JMP.

Links that didn't help me include:

I need to split a table of ~20k rows and ~30 columns, along one of the columns (integers between 0 and 13), to being ~1400 rows with ~25 split into 350.

An inelegant, but repeatable, example is splitting this cars table
enter image description here

according to this:
enter image description here

Yields this:
enter image description here

How do I do this and retain the ~5 non-split columns using an R library like tidyr or dplyr?

1
Could you provide a small (table) example of the input and the expected output?CPak
Even with your expected output, it's not clear to me how multiple values should be stored. For example, what is the desired output for Model 70 when Origin = Japanese and Cylinders = 4?bouncyball
Sorry for inelegance. I have two columns in the "split apart" bin: model and displacement. These are split along Year, so for each unique value of year, there is a column created where the prefix is "model" and the suffix is year". The years range from 70 to 82, so there are 13 "model xx" columns. The same goes for "Engine Disp xx". For each "group by" the unique element is placed in the cell. My stuff has unique 1:1, this example doesn't. So for each of unique of origin, for each of unique of cylinders|origin, populate "model xx" and "engine disp xx".EngrStudent

1 Answers

1
votes

Using reshape, it's not too terrible to do one split column at a time. You could then merge the model and engine.disp together. For your real example, you could just change the lists in aggregate and formula in cast.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

agg <- aggregate(list(model = x$model),list(origin = x$origin,cylinders = x$cylinders,year = x$year),FUN = paste,collapse = ',')

require(reshape)
output <- cast(data = agg,formula = origin + cylinders ~ year,value = 'model')

Edit: I haven't checked all possible cases, but this function should work similar to the split tables, or at least give you a good start.

x <- read.csv('http://web.pdx.edu/~gerbing/data/cars.csv',stringsAsFactors = F)
names(x) <- tolower(names(x))

jmpsplitcol <- function(data,splitby,splitcols,group){

  require(reshape)
  require(tidyr)

  aggsplitlist <- data[ ,names(data) %in% c(splitby,group)]
  aggsplitlist <- lapply(aggsplitlist,`[`)

  agg <- aggregate(list(data[ ,names(data) %in% splitcols]),aggsplitlist,FUN = paste,collapse = ',')

  newgat <- gather_(data = agg,key = 'splitcolname','myval',splitcols)

  castformula <- as.formula(paste(paste(group,collapse = ' + '),'~','splitcolname','+',splitby))
  output <- cast(data = newgat,formula = castformula,value = 'myval')
  output
}
res <- jmpsplitcol(x,c('year'),c('engine.disp','model'),c('origin','cylinders'))
head(res2)