I am using perl to scrape the following through .txt which I'd ultimately bring into Stata. What format option works? I have many such observations, so would like to use an approach over which I can generalize.
The original data are of the form:
First Name: Allen
Last Name: Von Schmidt
Birth Year: 1965
Location: District 1, Ocean City, Cape May, New Jersey, USA
First Name: Lee Roy
Last Name: McBride
Birth Year: 1967
Location: Precinct 5, District 2, Chicago, Cook, Illinois, USA
The goal is to create the variables in Stata:
First Name: Allen
Last Name: Von Schmidt
Birth Year: 1965
County: Cape May
State: New Jersey
First Name: Allen
Last Name: McBride
Birth Year: 1967
County: Cook
State: Illinois
What possible .txt might lead to such, and how would I load it into Stata?
Also, the amount of terms vary in Location as in these 2 examples, but I always want the 2 before USA.
At the moment, I am putting "", around each variable from the table for the .txt.
"Allen","Von Schmidt","1965","District 1, Ocean City, Cape May, New Jersey, USA"
"Lee Roy","McBride","1967","Precinct 5, District 2, Chicago, Cook, Illinois, USA"
Is there a better way to format the .txt? How would I create the corresponding variables in Stata?
Thank you for your help!
P.S. I know that stata uses infile or insheet and can handle , or tabs to separate variables. I did not know how to scrape a variable like Location in perl with all of the those so I added the ""