It sounds as if you or someone else read in the data with year
as a string variable and then used encode
to generate a numeric variable. That's quite the wrong approach, as you have found out: you do not want the string to be mapped to integers 1 up. You need destring
for that situation. Now that you have done this, you need decode
and then destring
or (if the original variable is still present in the dataset) destring
.
Note that you should check your data carefully. Why was year imported in this way in the first place? Often this happens when data come from a spreadsheet and people don't check carefully enough for metadata (e.g. header information).
clear
input str4 original
"1990"
"1991"
"1992"
end
encode original, gen(year)
* solution 1
decode year, gen(year2)
destring year2, replace
* solution 2 (better)
destring original, replace
list
+-------------------------+
| original year year2 |
|-------------------------|
1. | 1990 1990 1990 |
2. | 1991 1991 1991 |
3. | 1992 1992 1992 |
+-------------------------+
Also, in Stata, "format" is nothing to do with what is stored, but with what is displayed. See help format
. It is, naturally, an overloaded term in computing.