0
votes

I have a list of places with population, much like in the example data below:

sysuse census, clear

How can I combine (sum) only two observations to create a new observation, while maintaining the rest of the data?

In the below example I would like to combine Alabama and Alaska to create a new observation called 'Alabama & Alaska' with the sum of their populations.

With the new observation, the previous records will need to be deleted.

   +----------------------------+
   | state        pop           |
   |----------------------------|
1. | Alabama      3,893,888     |
2. | Alaska       401,851       |
3. | Arizona      2,718,215     |
4. | Arkansas     2,286,435     |
5. | California   23,667,902    |
   +----------------------------+

   +-----------------------------------+
   | state                 pop         |
   |-----------------------------------|
1. | Alabama & Alaska      4,295,739   |  <--Alabama & Alaska combined
2. | Arizona               2,718,215   |  <--Retain other observations and variables
3. | Arkansas              2,286,435   |     
4. | California            23,667,902  |  
   +-----------------------------------+

This is my original toy data example and its expected output:

PlaceName  Population
  Town 1      100
  Town 2      200
  Town 3      100
  Town 4      100

PlaceName             Population 
Town 1 & Town 2           300  
         Town 3           100  
         Town 4           100   
2

2 Answers

1
votes

Using your original toy example, the following works for me:

clear

input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end

generate PlaceName2 = cond(_n == 1, PlaceName + " & " + PlaceName[_n+1], PlaceName)
generate Population2 = cond(_n == 1, Population[_n+1] + Population, Population)

replace PlaceName2 = "" in 2
replace Population2 = . in 2

gsort - Population2 

list, abbreviate(12)

     +--------------------------------------------------------+
     | PlaceName   Population        PlaceName2   Population2 |
     |--------------------------------------------------------|
  1. |    Town 1          100   Town 1 & Town 2           300 |
  2. |    Town 4          100            Town 4           100 |
  3. |    Town 3          100            Town 3           100 |
  4. |    Town 2          200                               . |
     +--------------------------------------------------------+
0
votes

This is how to do it with collapse. As you ask, this combines two observations into one, and thus changes the dataset.

clear

input str6 PlaceName Population
"Town 1" 100
"Town 2" 200
"Town 3" 100
"Town 4" 100
end


replace PlaceName = "Towns 1 and 2" in 1/2 
collapse (sum) Population , by(PlaceName)

list 

     +--------------------------+
     |     PlaceName   Popula~n |
     |--------------------------|
  1. |        Town 3        100 |
  2. |        Town 4        100 |
  3. | Towns 1 and 2        300 |
     +--------------------------+