1
votes

I am reading a csv file into Stata using

import delimited "../data_clean/winter20.csv", encoding(UTF-8) 

The raw data looks like:

y             id1
-.7709586   000000000020
-.4195721   000000003969
-.8932499   300000000021
-1.256116   200000007153
-.7858037   000000000000

The imported data become:

y             id1
-.7709586   20
-.4195721   000000003969
-.8932499   300000000021
-1.256116   200000007153
-.7858037   0

However, there are some columns of IDs which are read as numeric. I would like to import them as strings. I want to read the data exactly as how the raw data looks like.

The way I found online is:

import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(74 97 116) clear 

However, the raw data may be updated and column numbers may change. The following

import delimited "/Users/tianwang/Dropbox/Construction/data_clean/winter20.csv", encoding(UTF-8) stringcols(id1 id2 id3) clear 

gives error id1: invalid numlist in stringcols() option. Is there a way to specify variable names rather than column numbers?

The reason is leading zeros are missing if I read IDs as numeric. Methodtostring does not recover the leading zeros. format id1 %09.0f only works if variables have equal number of digits.

1
Could you please show us how your data look like? - Álvaro A. Gutiérrez-Vargas
Yes just updated the question. - Tian
Could you please also indicate us your current Stata version and OS? - Álvaro A. Gutiérrez-Vargas
Sorry just see the message. Mine is Stata/MP 16.1 for Mac (64-bit Intel) - Tian

1 Answers

1
votes

I think this should do it.

import delimited "../data_clean/winter20.csv", stringcols(_all) encoding(UTF-8)  clear 

PS: Tested in Stata16/Win10