3
votes

I have Excel file(.xlsx) which has its column names on 4th row and starting of data on 5th row. I am not sure what to use to extract data in Proc Import in SAS. Please help. Thanks

2
short of truncating the blanks, you may be able to use RANGE parameter as in communities.sas.com/thread/12293?tstart=0NoChance
Problem is range won't be constant and I am trying to automate importing data from Excel in SAS dataset and processing it for my requirement.user2694624
Looks like someone wrote up a walk through on how to do this here. I think you have to format your Excel file as XLS rather than XLSXdavidcondrey
These days everybody uses latest version of excel(.xlsx) and I am using this program to pull data from more than 10 files, so converting them manually to xls doesn't look like good option and I am looking to automate this whole process, so looking for solution for this.user2694624
RANGE is your only real option here. I would import it once, figure out what row the data starts on, save that as a macro variable, then re-import with RANGE. DDE could also do it but that's much more complicated I feel.Joe

2 Answers

1
votes

I resolved a similar problem in SAS 9.2 importing in two strokes, one to explore the sheet and one to extract the data.

This is a generalisation of what I did there, but excuse me for posting source I did not test : I do not have SAS installed on my PC. Let us asume your data could looks like this (when saved as tab delimited file):

            Some title that does not interust us        
Author  Dirk Horsten                
Date    01-Jan-15               
Other   Irrelevant thing                

        Bar Foo     Val Remark
        A   Alfa    1   This is the first line
        B   Beta    2   This is the second line
        C   Gamma   3   This is the last line

So the actual data starts at cell C6 with the column header "Bar". Let us also assume we know to find columns "Foo", "Bar" and "Val" and probably some other columns that do not interest us, in an unknown order, and we do not know upfront how many data lines there are.

Now, we naively import the sheet a first time and query sasHelp to find out what was read:;

/** First stroke import, to explore the content of the sheet **/
proc import datafile="&file_name" out=temp_out dbms=excelcs replace;
    sheet="&sheet_name";
run; 

/** Find out what SAS read in **/
proc sql;
    select couint(*) into :nrColstempCos separ by ' '
    from sashelp.vcolumn where libName = 'WORK' and memName = 'TEMP_OUT';

    select name into :tempCos separated by ' '
    from sashelp.vcolumn where libName = 'WORK' and memName = 'TEMP_OUT';
quit;

Next we look for the headers and the data, so we know how to read it properly.; This works if all columns were interpreated as character values, but unfortunately Excel can not be forced to do so.

data _null_;
    set temp_out end=last;
    array temp {*} &tempCols.;

    retain foo_col bar_col val_col range_bottom 0; 
    if not (foo_col and bar_col and val_col) then do;
        range_left = 0;
        range_right = 0;

        /* Find out if we finally found the headers */
        do col = 1 to &nrCols.;
            select (upcase(temp(col));
                when ('FOO') do;
                    foo_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                when ('BAR') do;
                    bar_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                when ('VALUE') do;
                    val_col = col;
                    if not range_left then range_left = col;
                    rang_right = col;
                end;
                otherwise;
            end;
        end;
        if (foo_col and bar_col and val_col) then do;
            /** remember where the headers were found **/
            range_top = _N_ + 1;
            call symput ('rangeTop', range_top);

            rangeLeft = byte(rank('A') + range_left - 1);   
            call symput ('rangeLeft', rangeLeft);

            rangeRight = byte(rank('A') + range_right - 1); 
            call symput ('rangeRight', rangeRight);
        end;
    end; 
    else do;
        /** find out if there is data on this line **/
        if (temp(foo_col) ne '' and temp(bar_col) ne '' and temp(val_col) ne '') 
            then range_bottom = _N_ + 1;
    end;

    /** remember where the last data was found **/
    if last then call symput ('rangeBottom', range_bottom);
run;

To calculate rangeTop and rangeBottom, we take into account that the _N_th observation in SAS comes from the N+1th line in excel, because the first excel row is interpreted as the headers.

To calculate rangeLeft and rangeRight, we must find the relative position to the left columen in the range we will read and translate that into letters

Now we read in the relevant data only;

/** Second stroke import, to read in the actual data **/
proc import datafile="&file_name" out=&out_ds dbms=excelcs replace;
    sheet="&heet_name";
    range="&rangeLeft.&rangeTop.&rangeRight.&rangeBottom.";
run; 

Success. Feel free to test this code, if your have SAS on your computer and correct it.

0
votes

The following should work regardless of how many rows precede your data, provided the rows preceding your data are entirely blank.

libname xl excel 'C:\somefile.xlsx';

data sheet;
    set xl.'Sheet1$'n;
run;

libname xl clear;

This sets up your Excel workbook like a database and the sheets are referenced directly like tables. I should note that my setup is 64-bit SAS 9.4 with 64-bit Excel; it's my understanding this approach may not work as expected if, for example, you have 64-bit SAS and 32-bit Excel.