0
votes

I have a dataset that is composed of research studies. Within some of the studies are multiple data points (DP). My data is structured so that each row is a separate data point. Additionally, I have a separate variable that denotes the specific research article.

I need to obtain summary statistics from the data relative to the research studies (not DPs). In other words, I need for every row to become research studies with the DPs becoming counts.

I have tried the code below using contract. It works for the list command. However, I need summary statistics as well as I'd like to get summaries for multiple variables and combine them into one table once the data is organized.

contract study nation
drop _freq study
contract nation
list

EXAMPLE:

Raw Data:

Study DP Year Nation
1 1 2005 Brazil
1 2 2005 Brazil
1 3 2005 Brazil
1 4 2005 France
2 5 2006 Brazil
2 6 2006 Italy
3 7 2010 Brazil
3 8 2010 Canada
4 9 2011 Canada
5 10 2015 Brazil
6 11 2015 Canada

What I need:

Year f (of studies)
2005 1
2006 1
2010 1
2011 1
2015 2

And I also need a histogram of the above table.

Nation f (of studies)
Brazil 4
Canada 3
France 1
Italy 1

I have more variables that will need this. And they will need more than frequencies (e.g. mean, sd, var). So whatever solution is given needs to work for summarizing variables as well.

1

1 Answers

1
votes

egen will help with summary statistics and graphs. Its tag() function will let you tag each country just once.

Note here that dataex in Stata is a better way to give a code example, as explained in the Statalist FAQ and here at the Stata tag.

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(Study DP) int Year str6 Nation
1  1 2005 "Brazil"
1  2 2005 "Brazil"
1  3 2005 "Brazil"
1  4 2005 "France"
2  5 2006 "Brazil"
2  6 2006 "Italy" 
3  7 2010 "Brazil"
3  8 2010 "Canada"
4  9 2011 "Canada"
5 10 2015 "Brazil"
6 11 2015 "Canada"
end

egen tag = tag(Nation)

egen count = count(DP) , by(Nation)

histogram count if tag, discrete freq width(1) xla(1/6)