1
votes

I am working with a very long list of commodity names (var1). I would like to extract information from this list by creating a second variable (var2) that is equal to 1 if var1 contains certain keywords.

I was using the following code:

g soy = strpos(productsproduced, "Soybeans, ") | strpos(productsproduced, "Soybean, ")   | strpos(productsproduced, "soybeans, ")| strpos(productsproduced, "soybean, ") | productsproduced == "Soybeans" 

The list is much longer, given that the data was not properly coded, and each name appears in many different ways (as the excerpt in the code sample shows).

I believe that it would be much easier to work with a list (easier to look through the list certainly, and see if I am missing anything, etc.)

Unfortunately, it has been a while since I have worked with loops, but I was thinking something of the sort:

local mylist Soybean soybean Soybeans soybeans Soybeans, soybeans,
forval i = mylist {
g soy = strpos(var1, "`i'")
}

This doesn't quite work, but I am not sure how to code it. One definite issue is that Stata would not know in this case whether I would like it to use the or operator (yes, I would) or the and operator.

1

1 Answers

3
votes

The spirit is evident; the details need various fixes.

local mywords Soybean soybean Soybeans soybeans Soybeans, soybeans,
gen soy = 0 
foreach w of local mywords {
   replace soy = soy | strpos(var1, "`w'")
}

What's crucial is that you need replace inside the loop; otherwise the loop will fail second time round on a generate as the variable already exists.

In fact this example reduces to

gen soy = strpos(var1, "oybean") > 0 

on the assumption that oybean wouldn't match anything not wanted.

Standardising to lower case is often helpful

local mywords soybean soybeans soybeans, 
gen soy = 0 
foreach w of local mywords {
   replace soy = soy | strpos(lower(var1), "`w'")
}