0
votes

The basis of my database is violent events. I know in which grid-cell and on which date these events happened. For my analysis I use frequency of violent events in a certain grid-cell, country and year as the dependent variable. To create this frequency variable I collapse my violent event database and count the number of violent events in a certain grid-cell, in a certain year.

I want to include a value of 0 if there was no violent event in a certain grid-cell in a certain year. However, as my basis is the violent events database, I do not have these observations when I collapse my data. Is there any way to create a 0 value for observations in years and grid-cells where there was no violent event?

Before I collapse my dateset it looked like this

+--------------------------+
|Country Year Grid_cell_id |
|--------------------------|
|Benin    2012    365      |
|Benin    2012    365      |
|Benin    2012    367      |
|Benin    2014    365      |
|Benin    2014    365      |
|Benin    2014    365      | 
|Benin    2015    240      |
|Benin    2015    240      |
|Kenya    2012    430      |
|Kenya    2012    430      |
|Kenya    2013    467      |
|Kenya    2014    567      |
|Kenya    2014    567      |
|Kenya    2014    567      | 
+--------------------------+

Once I collapse my dataset, it looks like this:

+---------------------------------+
|Country  Year Grid_cell_id freq  |
|---------------------------------|
|Benin    2012    365        2    |
|Benin    2012    367        1    |
|Benin    2014    365        3    |
|Benin    2015    240        2    |
|Kenya    2012    430        2    |
|Kenya    2013    467        1    |
|Kenya    2014    567        3    |
+---------------------------------+

When I use fillin to create the 0 frequency value, I only use I only use grid-cell_id and year, but I also want information on the country in which the grid-cell is located. I can easily come to the dataset below with the fillin function, but how can I preserve the country value, without using it as a fillin criterion.

+---------------------------------+
|Country  Year Grid_cell_id freq  |
|---------------------------------|
|Benin    2012    365        2    |
|.        2013    365        0    |
|.        2014    365        3    |
|.        2015    365        0    |
|Benin    2012    367        1    |
|.        2012    367        0    |
|.        2013    367        0    |
|.        2014    367        0    |
|.        2015    367        0    |
+---------------------------------+
1

1 Answers

0
votes

Here I create the problem mentioned and then solve it with fillin. See help fillin and also this paper.

In this particular case, the problem could have been avoided altogether by using the zero option of contract. Indeed, to get a dataset of frequencies contract is often superior to collapse.

 sysuse auto, clear
(1978 Automobile Data)

. contract foreign rep78

. list, sepby(foreign)

     +--------------------------+
     | rep78    foreign   _freq |
     |--------------------------|
  1. |     1   Domestic       2 |
  2. |     2   Domestic       8 |
  3. |     3   Domestic      27 |
  4. |     4   Domestic       9 |
  5. |     5   Domestic       2 |
  6. |     .   Domestic       4 |
     |--------------------------|
  7. |     3    Foreign       3 |
  8. |     4    Foreign       9 |
  9. |     5    Foreign       9 |
 10. |     .    Foreign       1 |
     +--------------------------+

. fillin foreign rep78

. list, sepby(foreign)

     +------------------------------------+
     | rep78    foreign   _freq   _fillin |
     |------------------------------------|
  1. |     1   Domestic       2         0 |
  2. |     2   Domestic       8         0 |
  3. |     3   Domestic      27         0 |
  4. |     4   Domestic       9         0 |
  5. |     5   Domestic       2         0 |
  6. |     .   Domestic       4         0 |
     |------------------------------------|
  7. |     1    Foreign       .         1 |
  8. |     2    Foreign       .         1 |
  9. |     3    Foreign       3         0 |
 10. |     4    Foreign       9         0 |
 11. |     5    Foreign       9         0 |
 12. |     .    Foreign       1         0 |
     +------------------------------------+

. replace _freq = 0 if _fillin
(2 real changes made)

. drop _fillin

. list, sepby(foreign)

     +--------------------------+
     | rep78    foreign   _freq |
     |--------------------------|
  1. |     1   Domestic       2 |
  2. |     2   Domestic       8 |
  3. |     3   Domestic      27 |
  4. |     4   Domestic       9 |
  5. |     5   Domestic       2 |
  6. |     .   Domestic       4 |
     |--------------------------|
  7. |     1    Foreign       0 |
  8. |     2    Foreign       0 |
  9. |     3    Foreign       3 |
 10. |     4    Foreign       9 |
 11. |     5    Foreign       9 |
 12. |     .    Foreign       1 |
     +--------------------------+

EDIT

You seem to be ignoring the advice to use contract, zero which automatically adds observations with frequency zero.

Otherwise please note the detailed advice in the Stata tag wiki on presenting data examples and consider this variation on combining identifier and year using egen. For any long-term dataset, it seems unwise to assume that grid cells remain in the same country, so that is not assumed.

* Example generated by -dataex-. To install: ssc install dataex
clear
input str35 Country int(Year Grid_cell_id) byte freq
"Benin" 2012 365 2
"Benin" 2012 367 1
"Benin" 2014 365 3
"Benin" 2015 240 2
"Kenya" 2012 430 2
"Kenya" 2013 467 1
"Kenya" 2014 567 3
end

egen Id = concat(Grid_cell_id Country), p(" ")
fillin Id Year 
replace Grid_cell_id = real(word(Id, 1))
replace Country = trim(subinstr(Id, word(Id, 1), "", 1))
drop Id
replace freq = 0 if _fillin
drop _fillin