1
votes

I'm looking for a way - if there is any - to use a Stata syntax file in R. I have a dataset that contains country names and a Stata .do file that can translate the names into cow country codes:

USA = 1
Afghanistan = 700

Is there any why I can use that file with R or do I need to find someone with Stata to do it for me?

Any help is greatly appreciated, thank you :)

EDIT:
The .do file is plaintext, I can open it in Chrome or Textmate.
It looks like this:

capture drop gwno
gen gwno=.
replace gwno=   700 if country==    "Afganistan"
replace gwno=   700 if country==    "Afghanistan"
replace gwno=   700 if country==    "AFGHANISTAN"
replace gwno=   339 if country==    "Albania"
replace gwno=   615 if country==    "Algeria"
replace gwno=   232 if country==    "Andorra"
replace gwno=   540 if country==    "Angola"
replace gwno=   58  if country==    "Antigua & Barbuda"
...
3
Are stata .do files plain text? Can you show us a snippet or post it somewhere?Spacedman
Updated my question, thanks :)LukasKawerau
@LukasKawerau synthetic Israel? ;)oba2311

3 Answers

3
votes

Short answer: in your editor delete all "replace gwno=", then replace all "if country==" with a comma. Delete the first header lines and anything at the end.

Now you have a comma-separated file of codes and countries. Read into R, make a data frame, then use match to replace countries with numbers.

Apols for sketchy answer, but most of this is basic R.

You could also try reading the R file in with read.table or read.csv, skip the first two lines, then your codes and countries are in columns 3 and 6.

3
votes

Juste to rephrase @Spacedman say nto 1-line R command,

read.table(file = stat.file,skip=2)[,c(6,3)]

                 V6  V3
1        Afganistan 700
2       Afghanistan 700
3       AFGHANISTAN 700
4           Albania 339
5           Algeria 615
6           Andorra 232
7            Angola 540
8 Antigua & Barbuda  58
0
votes

I just stumbled over this question and I feel like posting a somewhat more general solution to this problem, even though you did not ask for this---but it might be useful for others.

Your task obviously is to map country names to countrycodes (as used in the project "Correlates of War"). There is a package called countrycode that is very useful, since it can translate country names to ISO-codes, COW-codes and so on:

df <-
  structure(
    list(
      name = c(
        "Afganistan",
        "Afghanistan",
        "AFGHANISTAN",
        "Albania",
        "Algeria",
        "Andorra",
        "Angola",
        "Antigua & Barbuda"
      ),
      ccode = c("700", "700", "700", "339", "615", "232", "540", "58")
    ),
    class = "data.frame",
    .Names = c("name", "ccode"),
    row.names = c(NA,-8L)
  )
df$ccode2 <- countrycode::countrycode(sourcevar = df$name,
                                      origin = "country.name",
                                      destination = "cown")

Which will give you:

               name ccode ccode2
1        Afganistan   700     NA
2       Afghanistan   700    700
3       AFGHANISTAN   700    700
4           Albania   339    339
5           Algeria   615    615
6           Andorra   232    232
7            Angola   540    540
8 Antigua & Barbuda    58     58
Warning message:
In countrycode::countrycode(sourcevar = df$name, origin = "country.name",  :
  Some values were not matched unambiguously: Afganistan

Note, that the typo in Afghanistan causes NA. The warning helps you to identify such cases. You can fix this using argument custom_match:

df$ccode2 <- countrycode::countrycode(sourcevar = df$name,
                                  origin = "country.name",
                                  destination = "cown", 
                                  custom_match = c("Afganistan" = "700"))

Which results in:

               name ccode ccode2
1        Afganistan   700    700
2       Afghanistan   700    700
3       AFGHANISTAN   700    700
4           Albania   339    339
5           Algeria   615    615
6           Andorra   232    232
7            Angola   540    540
8 Antigua & Barbuda    58     58