1
votes

I have a data file that I am trying to import into SAS that looks something like the below:

WCM2B   W   C   M   2   B   M.B 2   18.4    12.3    g
WCM2B   W   C   M   2   B   M.B 2   19.2    12.3    g
WCM2B   W   C   M   2   B   S.P 2   19.5    DQ     ('')
WCM2B   W   C   M   2   B   Z.G 2   17.7    10.7    g
WCM2B   W   C   M   2   B   Z.G 2   18.4    10.7    g
WCM2B   W   C   M   2   B   Z.G 2   17.6    10.8    g
WCM2B   W   C   M   2   B   Z.G 2   20.1    12.1    g

There are headings for each of these columns, some of which list categorical variables some of which do not.

My questions:

1) What is the proper code for ensuring a text file like this, delimited by spaces as shown above and with ~36 rows and 11 columns of data is properly formatted in SAS? How can I then perform operations on this data so that it comes up in the output window? Even the most basic procedure to do on some chosen infile would do. Ideally, if someone is feeling very generous I am trying to get an understanding of how to do regression analysis including analyzing residuals and standard statistics.

2) Do I need to change categorical variables into binary for it to properly analyze the data?

3) Are there any other issues with this data I'm missing that might make prevent it from working?

Thank you very much for your time.

3
As the table is written now, the delimiter appears to be tabs, not spaces. Is that the case? What does 'DQ' represent in the 3rd row, 10th column? Does "(' ')" in the 3rd row, 11th column represent a space?assumednormal
You may be correct about the tabs. DQ should become a '.' as I understand that's what SAS likes to have to indicate no data. DQs are no data. As well each column here has a heading I have not put in, so there is one more non-data row missing. I understand this means I need to put firstobs=2, right?user26091
Unless this question put more emphasis on the statistical part (end of question 1 and question 2) it will be closed and migrated to SO where programming and software-related issues are on-topic.chl
Re Q2: no, you seldom need to change categorical variables into binary form to use them in SAS. Many of the core statistical procs, e.g. proc summary, support using a class statement. Some, e.g. proc rank, require a by statement instead, which means you need to sort your dataset first.user667489

3 Answers

0
votes

Dealing ONLY with how to read your external file!

Assuming you have a file exactly as described (containing a header row in the first records and fields separated by spaces), you can use PROC IMPORT to read it into a SAS data set:

proc import out=want
     datafile='c:\temp\tempdata.txt'
     dbms=dlm;
     getnames=yes;
     delimiter = ' ';
run;

For delimited files like this, SAS uses a tool called the External File Interface to inspect the file and generate regular data-step code to read it. If you look in the SAS log, you will see the actual code that was generated (an infile statement, a set of data definition statements and an input statement). You can use that code as an example to refine the input as needed.

Note that SAS only has two data types (character and numeric). Classifications such as "categorical" and "binary" are matters of usage and not part of a formal data definition. However, certain other SAS tools (such as Enterprise Miner) do allow you to add attributes like this.

To get a simple listing of the data set contents written to the Output window, you can just run simple PROC PRINT:

proc print data=want;
   title 'This is my data';
run;

Questions about how to do things like a linear regression on a data set like this are probably beyond the purpose of StackOverflow. There is a wealth of information and examples in the documentation. In your case, start by reading the SAS Concepts book then read about PROC REG in the SAS/STAT Procedures Guide. Here is a link to the main SAS documentation.

0
votes

I don't have SAS available to test this code. Let me know how it goes.

proc format;
    invalue v10fmt "DQ"  = .
                   other = _same_;
run;

data dsname;
    informat v10 v10fmt.;
    length v1 $5. v2 v3 v4 v6 $1. v7 $3. v11 $1.;
    infile "//file/location/and/name" firstobs = 2 delimiter = "09"x;
    input v1-v11;
run;
0
votes

If you have a tab delimited data file, you could consider using PROC IMPORT at least initially.

proc import file="//wherever/myfile.txt" out=mydataset dbms=tab replace;
run;

That will generate a dataset. It will also, usefully, put the input code into the log. You can copy it from the log into your program editor and then make modifications if the import procedure makes poor decisions (for example, it might decide the column with "DQ" should be a character variable). You can adjust that to numeric, and rerun the pasted code.

Now you can do whatever you want to that dataset. You can do things like

proc freq data=mydataset; 
run;

The rest of your questions are really general research questions that can't be easily answered without both knowing your analysis and having a lot of time to write answers :) I would recommend doing some reading online on data analysis; these aren't really issues specific to SAS, but are general research guidelines, and there are lots of papers out there on the topics.