I have a dataset with a couple million rows. It is in csv format. I wish to import it into Stata. I can do this, but there is a problem - a small percentage (but still many) of the observations appear on two lines in the CSV file. Most of the entries occur on only one line. The troublesome observations that take up 2 lines still follow the same pattern as far as being delimited by commas. But in the Stata dataset, the observation shows up on two rows, both rows containing only part of the total data.
I used import delimited
to import the data. Is there anything that can be done at the data import stage of the process in Stata? I would prefer to not have to deal with this in the original CSV file if possible.
***Update
Here is an example of what the csv file looks like:
var1,var2,var3,var4,var5
text 1, text 2,text 3 ,text 4,text 5
text 6,text 7,text 8,text9,text10
text 11,text 1
2,text 13,text14,text15
text16,text17,text18,text19,text20
Notice that there is no comma at the end of the line. Also notice that the problem is with the observation that begins with text 11
.
This is basically how it shows up in Stata:
var1 var2 var3 var4 var5
1 text 1 text 2 text 3 text 4 text 5
2 text 6 text 7 text 8 text9 text10
3 text 11 text 1
4 2 text 13 text14 text15
5 text16 text17 text18 text19 text20
That sometimes the number is right next to text
isn't a mistake - it is just to illustrate that the data is more complex than is shown here.
Of course, this is how I need the data:
var1 var2 var3 var4 var5
1 text 1 text 2 text 3 text 4 text 5
2 text 6 text 7 text 8 text9 text10
3 text 11 text 12 text 13 text14 text15
4 text16 text17 text18 text19 text20
bindquote()
option. – Roberto Ferrer