0
votes

I have imported a large (14 gig) csv file into an R colbycol object, from which I intend to transfer it into a database using the R sqldf package after some manipulation. Formatting information for this file is available in the form of a SAS command file for the corresponding file with fixed-width fields and records. I have never used SAS and am unsure of the meanings of certain commands, which I need in order to be sure that the data they describe is imported correctly.

The online documentation I found for the value, input, and format statements (URLs copied below) did not help me with the three specific examples below, perhaps because they are too elementary. If anyone could tell me what these examples mean, I would appreciate it.

PROC format: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473472.htm

value statement: http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#p1upn25lbfo6mkn1wncu4dyh9q91.htm

input statement: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146292.htm

format statement: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000178212.htm

First: After the "PROC format" there are a large number of "value" statements. Most of these have a column of numeric codes followed by a short description. These seem to be straightforward analogs of the R factors, but with a more flexible choice of numeric internal representation, and I am converting them accordingly. However, some of them have strings of numerals -- apparently all 9's, 0's or 7's -- instead. Examples reproduced below. What do these numeral strings signify? In a few cases a number is assigned to an abbreviation. In the example below I believe "NIU" stands for "not in universe" However, I am not sure what this value statement implies for other values of the variable.

value FTOTVAL_f

0000999999 = "999999" ;

value INCTOT_f

00999997 = "00999997"

99999997 = "99999997"

99999999 = "99999999" ;

value OFFTOTVAL_f

0000999999 = "NIU"

Second: After the word "input", there are numeric ranges which I believe represent character ranges in an alternative, fixed-width record version of the file. Some of these are followed by an attached ".4" or a ".2" (not in quotes), and some are not. What do these suffixes mean?

Third: The command file ends with two format statement not immediately preceded by a "PROC". The first has a series of variable names followed by an exact repetition of the variable name with the attached suffix "_f." (no quotes). The second has a series of variable names followed by an integer between 8 and 18, exclusive, always followed by an attached period. The period is often followed by an attached 4. What is the significance of these two format statements?

1
if you could post this SAS code, one of us can probably help you out. Without the code, it is going to be hard to explain what is going on.DomPazz

1 Answers

1
votes

Proc format is a procedure in sas which is used to create customized user defined formats.Once formats are created then they can be applied to a variable(s) using format statement.In general "Format" can be understand as what you see on your output screen/dataset,instead of its actual value.If there is a field called month for which the values are 1 to 12 then you can assign 1 to Jan , 2 to Feb and so on, However the value remains as 1,2,3....12 but when you use print command, it will show you Jan, Feb ...Dec as per your value given in proc format. There are many inbuilt SAS formats also present , for example date9., mmddyynn.,date11. etc . The dot after the format is compulsory , if by chance you miss it then you will get an error.

Similar to date formats,there are decimals format also , for example if i write somthing like format number 9.2; where number is a variable name then 9.2 means , the total number of digits to be shown is 9 out of which 2 digits should be present after a decimal.

Another example of date could be

format dob date9.; ,it means whatever the values present in dob column should be shown as date9.(ddmmmyyyy), SAS like many tools(excel,spss) , takes date as number , so if you don't apply any format it will show you equivalent number not the date as expected,SAS counts dates from 1 Jan 1960 as reference of starting point.

SAS is very vast in every term and area of formats is very big there are a lot of inbuilt SAS formats or may be using user defined picture formats etc.

My advise(It can be opinionated):Always search for NESUG/SUGI papers for SAS to clarify yourself, the papers written by them is better than SAS documentation.

For more on formats,you can read this ,

http://www2.sas.com/proceedings/sugi27/p056-27.pdf

I have try to put in simple english, but when you read some technical documents, you will find some problems as terminologies in SAS and R different.

Hope this helps