I am reading a csv file into Stata using
import delimited "../data_clean/winter20.csv", encoding(UTF-8)
The raw data looks like:
y id1
-.7709586 000000000020
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 000000000000
The imported data become:
y id1
-.7709586 20
-.4195721 000000003969
-.8932499 300000000021
-1.256116 200000007153
-.7858037 0
However, there are some columns of IDs which are read as numeric. I would like to import them as strings. I want to read the data exactly as how the raw data looks like.
The way I found online is:
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(74 97 116) clear
However, the raw data may be updated and column numbers may change. The following
import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(id1 id2 id3) clear
gives error id1: invalid numlist in stringcols() option. Is there a way to specify variable names rather than column numbers?
The reason is leading zeros are missing if I read IDs as numeric. Methodtostring does not recover the leading zeros. format id1 %09.0f only works if variables have equal number of digits.