I have a dataset that shows how much was paid ("cenoz
" - cents per ounce) per product category during specific week and in a specific store.
clear
set more off
input week store cenoz category
1 1 2 1
1 1 4 2
1 1 3 3
1 2 5 1
1 2 7 2
1 2 8 3
2 1 4 1
2 1 1 2
2 1 10 3
2 2 3 1
2 2 4 2
2 2 7 3
3 1 5 1
3 1 3 2
3 2 5 1
3 2 4 2
end
I create a new variable cenoz3
that indicates how much on average was paid for category 3 given specific week and a store. Same with cenoz1, and cenoz2.
egen cenoz1 = mean(cenoz/ (category == 1)), by(week store)
egen cenoz2 = mean(cenoz/ (category == 2)), by(week store)
egen cenoz3 = mean(cenoz/ (category == 3)), by(week store)
It turns out that category 3 was not sold in any of the stores (1 and 2) in week 3. As a result, missing values are generated.
week store cenoz category cenoz1 cenoz2 cenoz3
1 1 2 1 2 4 3
1 1 4 2 2 4 3
1 1 3 3 2 4 3
1 2 5 1 5 7 8
1 2 7 2 5 7 8
1 2 8 3 5 7 8
2 1 4 1 4 1 10
2 1 1 2 4 1 10
2 1 10 3 4 1 10
2 2 3 1 3 4 7
2 2 4 2 3 4 7
2 2 7 3 3 4 7
3 1 5 1 5 3 .
3 1 3 2 5 3 .
3 2 5 1 5 4 .
3 2 4 2 5 4 .
I would like to replace missing values of a particular week with values of the previous week and matching store. That's to say:
replace missing values for category 3 in week 3 in store 1
with values for category 3 in week 2 in store 1
and
replace missing values for category 3 in week 3 in store 2
with values for category 3 in week 2 in store 2
Can I use command replace
or is it something more complicated than that?
Something like:
replace cenoz1 = cenoz1[_n-1] if missing(cenoz1)
But I also need to the stores to match, not just the time variable week
.
I found this code provided by Nicholas Cox at http://www.stata.com/support/faqs/data-management/replacing-missing-values/:
by id (time), sort: replace myvar = myvar[_n-1] if myvar >= .
Do you think
by store (week), sort: cenoz1 = cenoz1[_n-1] if missing(cenoz1)
is sufficient?
UPDATE:
When I use the code
by store (week category), sort: replace cenoz3 = cenoz3[_n-1] if missing(cenoz3)
It seems it delivers correct values:
week store cenoz category cenoz1 cenoz2 cenoz3
1 1 2 1 2 4 3
1 1 4 2 2 4 3
1 1 3 3 2 4 3
1 2 5 1 5 7 8
1 2 7 2 5 7 8
1 2 8 3 5 7 8
2 1 4 1 4 1 10
2 1 1 2 4 1 10
2 1 10 3 4 1 10
2 2 3 1 3 4 7
2 2 4 2 3 4 7
2 2 7 3 3 4 7
3 1 5 1 5 3 10
3 1 3 2 5 3 10
3 2 5 1 5 4 7
3 2 4 2 5 4 7
Is there any way to double check this code given that my dataset is quite large?
How make this code not so specific but applicable to any missing cenoz
if it finds one with missing vaues? (cenoz1, cenoz2, cenoz3, cenoz4...cenoz12)
replace
somewhere in your solution. What have you tried specifically? On another note, youregen, mean()
s are not computing means. I mean, they are, but for only one observation forweek store
groups. Is this a feature of your example data or would this be true for your real database? Is it what you intend? – Roberto Ferrerstore
s andcategor
ies to match so I believe you're missing something (but almost there). Have you not tried your own code to see if it is sufficient? Just give it a try and check the results. If you have any problem, update your question. – Roberto Ferrerreplace
in your last statement. – Nick Cox