0
votes

I have two data sets having the same content but one is in tab-delimited format, and the other is in space-delimited format.

Space-Delimited

Tab_Delimited

I have three questions which I could not figure them out and would like to ask for help. Any suggestions would be highly appreciated.

First, I used the TextWrangler to open these two data sets, and I feel that the space-delimited data set means that the data sets are separated by spaces and the observations each row are in the same position. On the other hand, my understanding for tab-delimited data set was that the data sets which are separated by blanks and the blanks might not be necessary the same widths for each rows of the variables. Was my understanding correct? I am having trouble distinguishing them.

Second, I was printing out the snowfall dataset as mentioned above from row number 5 to row number 122, and the "T" values in the dataset has to be converted to 0.

My code for the space-delimited file of the snowfall data was as below, and my question was about its LOG. There were many warnings about "T" but I did not receive any errors.

LOG

Should I be concerned about the warnings here mentioning

"invalid data for month(i) in line..."

* Trying Space-Delimited data set;

OPTIONS Errors=200;

DATA SASWEEK.SnowSpace;
  DROP i MyTot diff;
  INFILE "&dirLSB.RochesterSnowfallSpace.txt" FIRSTOBS= 2 OBS= 122;
  INPUT Season $ Sep Oct Nov Dec Jan Feb Mar Apr May Total ;
  ARRAY Month(10) Sep -- Total;
    DO i = 1 TO 10 ;
    IF Month(i) = . THEN Month(i) = 0 ;

MyTot = sum (of Sep -- May);
diff = round (MyTot-Total, 3);
    IF diff ne 0 THEN PUT "**ERROR" MyTot= Total= diff= ;
    END;

PROC PRINT DATA=sasweek.snowspace;
    TITLE "Rochester Snowfall in Space-Delimited format";
RUN;

One of my professors suggested I should have made the monthly snowfall as "character". So the "T"s would not incur a warning in the LOG. I am not sure whether I should try it this way.

Lastly, I tried to use "Proc Import" for the same data set but in xls file.

The data set is as the link And my code is as follows:

    * Trying Excel file ;

OPTIONS ERRORS=200;
OPTIONS MSGLEVEL=i;

PROC IMPORT OUT=SASWEEK.SNOWxls 
DATAFILE= "&dirLSB.RochesterSnowfall.xls" DBMS=xls;
GETNAMES= no;
RANGE= "Sheet1$a5:k122" ;
PROC PRINT DATA= SASWEEK.SNOWxls;
  TITLE "Rochester Snowfall in xls format";
RUN;

I received the error in the LOG saved as the HTML

I still printed out a part of the dataset but the variable names were messed up and the output was not complete. Any ideas?

Thank you all for your reading and thanks for any help:)

1

1 Answers

1
votes

The DATA step with INPUT statement might be the best place to start.

WARNINGs are fine, unless the goal is to have no warnings.

The data file can be cleanly read by creating an input environment built for it:

  • Custom informat zeroT converts T(text) to 0(number). Prevents warnings.
  • INFILE
    • DLM='0920'x specifying either tab or space may be delimiting data file values.
  • INPUT
    • Wrap fields Sep to Total in parenthesis ( ) to indicate grouped input
    • Wrap informat specifiers in parenthesis ( ) that are applied over grouped variables
    • : list input modifier that advances input parsing to next non-blank and reads until next character is blank.

Sample Code

proc format;
  invalue zeroT 'T'=0 other=[best12.];
run;

data have;
  infile snowdata firstobs=2 dlm='0920'x;
  INPUT Season $ (Sep Oct Nov Dec Jan Feb Mar Apr May Total) (10 * :zeroT.) ;
run;

Sample Data (from SP text viewer)

filename snowdata "%TEMP%\roc_snowfalls.txt";

* create local sample data file, text copied from sharepoint viewer;

data _null_;
file snowdata;
input;
put _infile_;
datalines;
Season   Sep     Oct     Nov     Dec     Jan     Feb     Mar     Apr     May    Total
1884-85    0       T       1      27.1    22.2     17     3.5     19.5     T      90.3
1885-86    0      1.7     8.2     8.4     16.9     16     6.5      7       0      64.7
1886-87    0       T      22.2    12.5     12     18.4    6.3     1.2      0      72.6
1887-88    0      0.2     2.2     9.3     21.3    4.1     13.2    0.4      0      50.7
1888-89    0       T       4      15.5    17.8     22     17.5    5.4      0      82.2
1889-90    0       T      5.7     6.1     20.2    14.8     19      T       0      65.8
1890-91    0       0      2.1     29.2    16.1    24.6    12.2    0.3     0.1     84.6
1891-92    0      0.1     9.7     4.7     26.4    10.3    25.1    0.8      T      77.1
1892-93    0       T       14     19.2    15.9    29.8    8.1     9.6      0      96.6
1893-94    0      0.5     6.1     27.6     20     29.5    5.4     13.3     0     102.4
1894-95    0       T      11.1    22.1    26.5    23.6    9.5     0.6      0      93.4
1895-96    0      1.5     5.9     8.7     22.5    39.1    45.1     1       0     123.8
1896-97    0       T      5.5     13.9    20.1    13.7    8.1     5.2      0      66.5
1897-98    0       0      10.1    18.4    32.1    26.8    1.2     2.4      0       91
1898-99    0       T      10.6     27     16.6    16.3    21.2    4.3      T       96
1899-00    T       T      1.3     21.5    24.7    28.5     54     1.3      0     131.3
1900-01    0       0       17     20.3    29.8    36.9    13.7    23.8     T     141.5
1901-02    0      0.1     14.1    14.5    23.8     23     1.2     2.3      T       79
1902-03    0      0.1     4.1     27.7    18.1    15.6    2.4     0.3      0      68.3
1903-04    0      0.6     4.4     16.1    27.2    17.2    10.7    19.5     T      95.7
1904-05    0      0.2     2.1     15.8    27.5    15.2     7      0.5      0      68.3
1905-06    0       T       4      8.4     7.6      8      15.2    1.1      0      44.3
1906-07    0       5      5.7     18.7    11.7    15.7    3.1     2.5     1.3     63.7
1907-08    0       0      2.2     11.6    16.5    19.8    7.9     6.3      3      67.3
1908-09    0      0.5     4.6      10     22.5    6.1     9.7     9.8     3.3     66.5
1909-10    0       T      1.7     14.6     22     42.7    3.4     0.5      0      84.9
1910-11    0      2.2     15.7    29.8    9.5      30     13.5    4.7      2     107.4
1911-12    0       0      6.5     7.5     21.5    10.8    8.8     6.9      T       62
1912-13    0       0      7.2     6.9      10     18.6    15.2    1.3      0      59.2
1913-14    0      0.2     0.3     14.4    15.1    21.6    27.9    7.2      0      86.7
1914-15    0      0.8     4.7     16.1    22.9    9.8      6      0.5      0      60.8
1915-16    0       0      3.4     14.8    8.5     35.7    43.8    0.7      0     106.9
1916-17    0       0      11.7    24.9    22.7    16.7    14.6    2.3      T      92.9
1917-18    0       T      7.9     29.7    17.2    12.7    10.5    1.3      0      79.3
run;