0
votes

I need help in a particular issue with Stata. I have a panel dataset by id year from 1996 to 2018. The panel data is a combination of world countries and regions, yearly observations, for 7 different crops, area cultivated.

I would like to create a mean around years 2000, 2010 and 2018, so that mean(year2000)= mean of (1999+2000+2001), mean(year2010)=mean from (2009+2010+2011) and mean(year2018)= mean from (2016+2017+2018) for every crop from my 7 crops selection.

Then the problem is even more complicated when I need to combine some countries to form sub-regions: say I need the sub-region RUS1 = Russia + Ukraine. How can I create another variable that shows the total from crop1 between crop1 area cultivated in Russia + crop1 area cultivated in Ukraine on yearly basis. Meaning another variable that shows these sums for each year using the above means. I've tried with by id year: egen area_rus1=total(area) if area=="Russia" & area=="Ukraine" but nothing works.

The names of area being strings I used encode (area), gen (area2) and automatically Stata generates a number. In order to create a panel dataset i've used gen id=area2+itemcode

The panel data looks like this after sort year

enter image description here Please be aware that the period is 1996-2018. The example above shows only year 1996.

1

1 Answers

0
votes

This didn't get much of a response, for several reasons:

  1. You didn't show very much code.

  2. You didn't show data in a form that is especially useful. An image can't be copied and pasted easily into someone's Stata to allow experiment. In fact your image shows variables that are irrelevant and variables that are different versions of each other and so is much more complicated than we need.

  3. You escalated the question to ask the most complicated version of what you want to know.

  4. There is a problem you should have explained better. area is string and so totals can't be calculated at all and area2 is just arbitrary integers so totals can be calculated but don't make sense. "nothing works" is not informative as a problem report. The only totals that make sense to me are totals of value.

You need to simplify your problem first and then build up.

The essence seems to be as follows:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 country str6 item float year str1 region float value
"A" "barley" 1999 "X" 1
"B" "barley" 1999 "X" 2
"C" "barley" 1999 "Y" 3
"A" "barley" 2000 "X" 4
"B" "barley" 2000 "X" 5
"C" "barley" 2000 "Y" 6
"A" "barley" 2001 "X" 7 
"B" "barley" 2001 "X" 8
"C" "barley" 2001 "Y" 9 
end

* means by countries: similar variables for other periods 
egen mean_9901_c = mean(cond(inrange(year, 1999, 2001), value, .)), by(country item)

* aggregation to regions, but ensure that you don't double count 
egen value_region = total(value), by(region item year)
egen tag = tag(region item year)

* means by regions: similar variables for other periods 
egen mean_9901_r = mean(cond(tag == 1 & inrange(year, 1999, 2001), value_region, .)), by(region item)

list, sepby(year)

     +---------------------------------------------------------------------------------+
     | country     item   year   region   value   mean_9~c   value_~n   tag   mean_9~r |
     |---------------------------------------------------------------------------------|
  1. |       A   barley   1999        X       1          4          3     1          9 |
  2. |       B   barley   1999        X       2          5          3     0          9 |
  3. |       C   barley   1999        Y       3          6          3     1          6 |
     |---------------------------------------------------------------------------------|
  4. |       A   barley   2000        X       4          4          9     1          9 |
  5. |       B   barley   2000        X       5          5          9     0          9 |
  6. |       C   barley   2000        Y       6          6          6     1          6 |
     |---------------------------------------------------------------------------------|
  7. |       A   barley   2001        X       7          4         15     1          9 |
  8. |       B   barley   2001        X       8          5         15     0          9 |
  9. |       C   barley   2001        Y       9          6          9     1          6 |
     +---------------------------------------------------------------------------------+

The example shows just one item, but the code should work for several.

The example shows fake data for just three years, but means for other periods can be constructed similarly.

Results are repeated for all observations to which they apply. To see or use results just once, use if. For example the means over 1999 to 2001 are shown for each of those years (and others) but if year == 1999 would be a way to see results just once.

See also help collapse, help egen for its tag() function and this paper.

What was wrong with your code

Your problems start with

if area=="Russia" & area=="Ukraine"

which selects observations for which it is true that area is both "Russia" and "Ukraine" in the same observation, which is impossible. You need the | (or) operator there, not the & operator, or to approach the problem in another way.

The prefix id is wrong too. Using by id: enforces separate calculations for different values of id and is going to make the combinations of identifiers impossible.