Reshaping complex dataset in Stata, long to wide

Question

I am looking to reshape this long dataset into a wide one. The reshaping will be done with fartyid as my unique ID conditional on landingsdato, meaning that a new observation will be made up of all fartyID who share landingsdato. Most of my variables not included here can be collapsed without any further considerations, as, conditional on these two, they will hold the same values for every observation.

The problem are the categorical variables included here (artkode, produkttilstandkode and strrelsesgruppering) which I'd like to keep all information on, in a generalized way so that I can work with them for the whole dataset. produkvekt is a numerical variable which denotes the quantity of artkode.

fartyid landingsdato    artkode artbokml    produkttilstandkode strrelsesgruppering produktvekt 
1926005936  01.03.2004  1032    Sei         211                 4023999             20  
1926005936  01.03.2004  1032    Sei         211                 4012023             14  
1926005936  01.03.2004  102201  Skrei           641                 3000000         55  
1926005936  01.03.2004  102201  Skrei           642                 3000000         60  
1926005936  01.03.2004  102201  Skrei           211                 4010025         60  
1926005936  01.03.2004  102201  Skrei           211                 4025999         500

I can't wrap my head around how this should be done, if it's possible at all, so I am grateful for all input.

Unique values for categorical variables:

strrelsesgruppering: 457
produkttilstandkode: 53
artkode: 149

As an example of how I might wind up:

fartyid landingsdato    1032    produkttilstandkode strrelsesgruppering produktvekt 
1926005936  01.03.2004  1       211                 4023999             20

And then the subsequent artkode lined up along the row, but this would give me problems when working with the dataset as I'd have several of the same artkode in different columns.

I can't wrap my head around how this should be done if it's possible at all.

Why do you want a wide structure? These are in good condition for most Stata operations. What calculations make you think you require a reshape? — Nick Cox
Hi, sorry for the late reply. I somehow managed to create duplicate accounts and haven't checked this since posting. Anyway explanation as to why and how I solved it is the answer section. — August Schirmer

August Schirmer August Schirmer · Accepted Answer · 2014-12-05T10:15:45

This is an answer to my post, but also to Nick who commendted above.

The reason for wanting to reshape is that each "fartyid" observation is the description of a ship delivering fish. Same "landingsdato" for an indiviudal "fartyid" is just several rows describing the same delivery, as the data has been gathered from documents with seperate lines for different buyers of fish on delivery. The aim was to get all the non constant information for each delivery in one row and run my regressions from there. I accomplished this with the follwing bit of code:

sort fartyid landingsdato_ny
egen IDtur=group (fartyid landingsdato_ny)

sort artkode produkttilstandkode
egen kombinasjon=group(artkode produkttilstandkode)
forvalues k = 1/309 {
    gen RQArtProd`k' = rundvekt if kombinasjon==`k'
    gen FQArtProd`k' = fangstverdi if kombinasjon==`k'
    compress RQArtProd`k'
    compress FQArtProd`k'

collapse (first) *constant variable1 constant variable2.... (sum) **RQArtProd`k' FQArtProd`k', by(IDtur)

*These variables are just an illustration of several constant variables
**Also an illustration. Original code contains around 900 variables generated by the above loop.

I collapsed all variables for each indivdual trip. Information about the ship is of course constant and was collapsed by (first) whilst the combination of the two variables that were transposed I used (sum) for. In the end this gives me info about total catch be weight and value for each combination of species and it's condition(with or without head, etc.) I also had to rename all the variables that were generated in the loop. Lastly the "strrlesgruppering" mentioned was dropped completely because of inconsistency.

Reshaping complex dataset in Stata, long to wide

1 Answers