Reshape Panel Data Wide Format to Long Format

Question

I am struggling with transformation of a Panel Dataset from wide to long format. The Dataset looks like this:

ID | KP1_430a | KP1_430b | KP1_430c | KP2_430a | KP2_430b | KP2_430c | KP1_1500a | ...  
1     ....
2     ....

KP1; KP2 up to KP7 describe the Waves. a,b up to f describe a specific Item. (E.g. left to right right placement of Party a)

I would like to have this data in long format. Like this:

ID | Party | Wave | 430 | 1500  
 1     1       1     ..    ..
 1     2       1     ..    ..
 .     .       .          
 1     1       2     ..    ..
 .     .       .         
 2     1       1     ..    ..

I tried to use the reshape function. But I had problems reshaping it over time and over the parties simultaneously.

Here is a small data.frame example.

data <- data.frame(matrix(rnorm(10),2,10))  
data[,1] <- 1:2  
names(data) <- c("ID","KP1_430a" , "KP1_430b" , "KP1_430c" , "KP2_430a" , "KP2_430b ", "KP2_430c ", "KP1_1500a" ,"KP1_1500b", "KP1_1500c")

And this is how far I got.

  data_long <- reshape(data,varying=list(names(data)[2:4],names(data)[5:7], names(data[8:10]),  
                            v.names=c("KP1_430","KP2_430","KP1_1500"),  
                           direction="long", timevar="Party")

The question remains: how I can get the time varying variables in long format as well? And is there a more elegant way to reshape this data? In the code above I would have to enter the names (names(data)[2:4]) for each wave and variable. With this small data.frame it is Ok, but the Dataset is a lot larger.

EDIT: How this transformation could be done by hand: I actually have done this, which leaves me with a page-long code file.
First, Bind KP1_430a and KP1_1500a with IDs, Time=1 and Party=1 column wise. Second create the same object for all parties [b-f], changing the party index respectively, and append it row-wise. Do step one and two for the rest of the waves [2-7], respectively changing party and time var, and append them row-wise.

If you want separate columns for 430 and 1500 in the long format then there should be an equal amount of data from those conditions in the wide. The way you have it, you'd have a lot of NAs in the 1500 column... or did you want it that way? — John
Oh, this probably all a bit sketchy. There is equal amount of waves (1-7) and party items (a-f) for these two variables. So: KP[1-7]_430[a-f], KP[1-7]_1500[a-f]. — lstoetze
However, for some Variables in the Dataset there is (a) only Data fo some waves - e.g. KP[146]_1640[a-f] or (b) not party specific - e.g. KP[1-7]_1490. — lstoetze

Vincent Zoonekynd Vincent Zoonekynd · Accepted Answer · 2012-01-25T14:56:47

It is usually easier to proceed in two steps: first use melt to put your data into a "tall" format (unless it is already the case) and then use dcast to convert ti to a wider format.

library(reshape2)
library(stringr)

# Tall format
d <- melt(data, id.vars="ID")

# Process the column containing wave and party
d1 <- str_match_all( 
  as.character( d$variable ), 
  "KP([0-9])_([0-9]+)([a-z])" 
)
d1 <- do.call( rbind, d1 )
d1 <- d1[,-1]
colnames(d1) <- c("wave", "number", "party")
d1 <- as.data.frame( d1)
d <- cbind( d, d1 )

# Convert to the desired format
d <- dcast( d, ID + wave + party ~ number )

Reshape Panel Data Wide Format to Long Format

2 Answers