I have imported a large (14 gig) csv file into an R colbycol object, from which I intend to transfer it into a database using the R sqldf package after some manipulation. Formatting information for this file is available in the form of a SAS command file for the corresponding file with fixed-width fields and records. I have never used SAS and am unsure of the meanings of certain commands, which I need in order to be sure that the data they describe is imported correctly.
The online documentation I found for the value, input, and format statements (URLs copied below) did not help me with the three specific examples below, perhaps because they are too elementary. If anyone could tell me what these examples mean, I would appreciate it.
PROC format: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473472.htm
value statement: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1upn25lbfo6mkn1wncu4dyh9q91.htm
input statement: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146292.htm
format statement: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000178212.htm
First: After the "PROC format" there are a large number of "value" statements. Most of these have a column of numeric codes followed by a short description. These seem to be straightforward analogs of the R factors, but with a more flexible choice of numeric internal representation, and I am converting them accordingly. However, some of them have strings of numerals -- apparently all 9's, 0's or 7's -- instead. Examples reproduced below. What do these numeral strings signify? In a few cases a number is assigned to an abbreviation. In the example below I believe "NIU" stands for "not in universe" However, I am not sure what this value statement implies for other values of the variable.
value FTOTVAL_f
0000999999 = "999999" ;
value INCTOT_f
00999997 = "00999997"
99999997 = "99999997"
99999999 = "99999999" ;
value OFFTOTVAL_f
0000999999 = "NIU"
Second: After the word "input", there are numeric ranges which I believe represent character ranges in an alternative, fixed-width record version of the file. Some of these are followed by an attached ".4" or a ".2" (not in quotes), and some are not. What do these suffixes mean?
Third: The command file ends with two format statement not immediately preceded by a "PROC". The first has a series of variable names followed by an exact repetition of the variable name with the attached suffix "_f." (no quotes). The second has a series of variable names followed by an integer between 8 and 18, exclusive, always followed by an attached period. The period is often followed by an attached 4. What is the significance of these two format statements?