1
votes

I'm running into some issues while trying to reshape a data set from long to wide. Here's an example, since I think that explains it best:

Say I wanted to take this long data set...

|study_id  |event_date |code     |
|--------------------------------|
|1         |09 June 15 |546      |
|1         |09 June 15 |643      |
|2         |23 May 13  |324      |
|2         |12 May 13  |435      |

And shape it into a wide one like this...

|study_id    |event_date_1 |event_date_1_code1 |event_date_1code2| event_date_2   |event_date_2_code1 | event_date_2_code2|
|-------------------------------------------------------------------------------------------------------------------------|
|1           |09 June 15   |546                |643              |                |                   |                   |              
|2           |23 May 15    |324                |                 |12 May 13       |435                |                   |             

What would be the best method of doing this? I imagine I would have to create some sort of j variable, but am not certain how to make it so each event_date could have multiple codes, and each study_id multiple event_dates.

I already tried doing making a j variable and reshaping, using the following code:

//Sort by id (just in case)
sort study_id event_date code

//Create j variable
quietly by study_id: gen code_num = cond(_N==1, 1, _n)

//Reshape data
reshape wide event_date code, i(study_id) j(code_num)

This, however, did not account for each event_date having multiple potential codes.

I am attempting to convert the data to wide so that I can merge it with another wide data set, and then run analysis over both. An observation in either set is an unique study_id.

1
I suggest that you need a really good reason to prefer this data structure. What you have is far more straightforward. Perhaps you can explain what you want to do with it and why it would be better for any purpose.Nick Cox
I have another data set that contains information on each study_id. I want to use both data sets in an analysis, so I am trying to convert this one to wide, then merge it with my existing wide data set.nman
Merging is still possible with this present structure. In neither structure is it obvious what defines an observation. You presumably know, but is it obvious in your question? (You may not be able to see, but two people have posted answers here and then deleted them, and it seems that neither is at all clear on what to recommend to you.)Nick Cox
I edited my post, let me know what else I can do to help clarify. The other data set is quite large, with ~300 variables, and ideally i'd have all the information in one larger data set.nman
What defines an observation? That remains unclear to me.Nick Cox

1 Answers

2
votes

Let me start by saying that I would not ever choose to organize my data in the requested fashion, so this should not be taken as support for doing so.

Having said that, something like the following seems to do the trick. The data is similar yours but I'm too lazy to deal with full dates, I just read in the day of the month. I'm posting this as a curiosity, because I've never before seen a need to do reshape wide twice in succession.

clear
input study_id  date code
1  09  546
1  09  643
2  23  324
2  12  435
end
list
bysort study_id date (code): generate codenum = _n
reshape wide code, i(study_id date) j(codenum)
rename code* code_*_
list
bysort study_id (date): generate eventnum = _n
reshape wide date code_*, i(study_id) j(eventnum)
list