0
votes

UPDATE:

I solved the first part of the problem. I created unique ids for each observation:

gen id=_n

Then, I used

fillin id categ

which essentially created what I was looking for.

However, for the rest of the variables (except id and categ), almost all observations are missing. Now, I need your help to duplicate the rest of the variables instead of having them missing. Just as an example, each observation is associated with a particular week. I am missing most of them. Or another dummy variable indicates whether a purchase was made at a drug or grocery store. Most of them are missing too.

Thanks!

ORIGINAL MESSAGE:

Need your help in Stata!

Each observation in my database is a 1-unit purchase of a beer product made by a customer. These product purchases are categorized unto 8 general categories such that the variable "categ" has values from 1 to 8 (1=import, 2=craft, 3=premium, 4=light, etc). For my multinomial logit model, I need to observe all categories purchased or not purchased by the customer in each observation.

Assume, this is my initial dataset:

customer id-------beer category-----units purchased

----------1------------------1--------------------- 1

----------2----------------- 3--------------------- 1

----------3 -----------------2 ---------------------1

This is what I am looking for:

customer id-------beer category-----units purchased

----------1------------------1--------------------- 1

----------1 -----------------2 ---------------------0

----------1----------------- 3--------------------- 0

----------2----------------- 1--------------------- 0

----------2----------------- 3--------------------- 1

----------2 -----------------3--------------------- 0

----------3----------------- 1--------------------- 0

----------3----------------- 2--------------------- 0

----------3 -----------------2 ---------------------1

Currently, my dataset is 600,000 obs. After this procedure, I should have 600,000*8=4,800,000 obs.

When constructing this code, it is necessary that all other variables in the dataset are duplicated according to the associated category of beer.

I assume that "fillin" and less likely "expand" might work.

You help will tremendously help. Thanks!

1
Please show us what you've tried (Stata code) and point to the problem you're having with it. You should read stackoverflow.com/help and stackoverflow.com/help/on-topic carefully. Showing attempts also signals you've done your part of the research/work.Roberto Ferrer
Please see the update in the original post. Thanks!Olga
Answer for "Now, I need your help to duplicate the rest of the variables instead of having them missing": One way to do is to merge the original dataset with new data (one with missing) on vars id and categMetrics

1 Answers

1
votes

This is an old question, but i'll post a possible answer if someone else is having this problem. In this case, you could generate variables for every option of your "choice variable", and after that, apply the reshape long command:

tab beercategory, gen(b)

reshape long b , i(customerid) j(newvarname)

Greetings