5
votes

I am struggling with transformation of a Panel Dataset from wide to long format. The Dataset looks like this:

ID | KP1_430a | KP1_430b | KP1_430c | KP2_430a | KP2_430b | KP2_430c | KP1_1500a | ...  
1     ....
2     ....

KP1; KP2 up to KP7 describe the Waves. a,b up to f describe a specific Item. (E.g. left to right right placement of Party a)

I would like to have this data in long format. Like this:

ID | Party | Wave | 430 | 1500  
 1     1       1     ..    ..
 1     2       1     ..    ..
 .     .       .          
 1     1       2     ..    ..
 .     .       .         
 2     1       1     ..    ..  

I tried to use the reshape function. But I had problems reshaping it over time and over the parties simultaneously.

Here is a small data.frame example.

data <- data.frame(matrix(rnorm(10),2,10))  
data[,1] <- 1:2  
names(data) <- c("ID","KP1_430a" , "KP1_430b" , "KP1_430c" , "KP2_430a" , "KP2_430b ", "KP2_430c ", "KP1_1500a" ,"KP1_1500b", "KP1_1500c")

And this is how far I got.

  data_long <- reshape(data,varying=list(names(data)[2:4],names(data)[5:7], names(data[8:10]),  
                            v.names=c("KP1_430","KP2_430","KP1_1500"),  
                           direction="long", timevar="Party")

The question remains: how I can get the time varying variables in long format as well? And is there a more elegant way to reshape this data? In the code above I would have to enter the names (names(data)[2:4]) for each wave and variable. With this small data.frame it is Ok, but the Dataset is a lot larger.

EDIT: How this transformation could be done by hand: I actually have done this, which leaves me with a page-long code file.
First, Bind KP1_430a and KP1_1500a with IDs, Time=1 and Party=1 column wise. Second create the same object for all parties [b-f], changing the party index respectively, and append it row-wise. Do step one and two for the rest of the waves [2-7], respectively changing party and time var, and append them row-wise.

2
If you want separate columns for 430 and 1500 in the long format then there should be an equal amount of data from those conditions in the wide. The way you have it, you'd have a lot of NAs in the 1500 column... or did you want it that way?John
Oh, this probably all a bit sketchy. There is equal amount of waves (1-7) and party items (a-f) for these two variables. So: KP[1-7]_430[a-f], KP[1-7]_1500[a-f].lstoetze
However, for some Variables in the Dataset there is (a) only Data fo some waves - e.g. KP[146]_1640[a-f] or (b) not party specific - e.g. KP[1-7]_1490.lstoetze

2 Answers

4
votes

It is usually easier to proceed in two steps: first use melt to put your data into a "tall" format (unless it is already the case) and then use dcast to convert ti to a wider format.

library(reshape2)
library(stringr)

# Tall format
d <- melt(data, id.vars="ID")

# Process the column containing wave and party
d1 <- str_match_all( 
  as.character( d$variable ), 
  "KP([0-9])_([0-9]+)([a-z])" 
)
d1 <- do.call( rbind, d1 )
d1 <- d1[,-1]
colnames(d1) <- c("wave", "number", "party")
d1 <- as.data.frame( d1)
d <- cbind( d, d1 )

# Convert to the desired format
d <- dcast( d, ID + wave + party ~ number )
0
votes

At the moment your Wave data is in your variable names and you need to extract it with some string processing. I had no trouble with melt

mdat <- melt(data, id.vars="ID")
mdat$wave=sub("KP", "", sub("_.+$", "", mdat$variable)) # remove the other stuff
mdat

Your description is too sketchy (so far) for me to figure out the rule for deriving a "Party" variable, so perhaps you can edit you question to show how that might be done by a human being .... and then we can show the computer how to to do it.

EDIT: If the last lower-case letter in the original column names is Party as Vincent thinks, then you could trim the trailing spaces in those names and extract:

mdat$var <- sub("\\s", "", (as.character(mdat$variable)))
mdat$party=substr( mdat$var, nchar(mdat$var), nchar(mdat$var))
#--------------
> mdat
   ID  variable      value wave party       var
1   1  KP1_430a  0.7220627    1     a  KP1_430a
2   2  KP1_430a  0.9585243    1     a  KP1_430a
3   1  KP1_430b -1.2954671    1     b  KP1_430b
4   2  KP1_430b  0.3393617    1     b  KP1_430b
5   1  KP1_430c -1.1477627    1     c  KP1_430c
6   2  KP1_430c -1.0909179    1     c  KP1_430c
<snipped output>