1
votes

Assume we have a data directory C:\temp\ which contains hundreds of Excel files, e.g. isin47.xls and isin11.xls. Of each Excel file I want to extract the value of cell $E$16 and create a long table result which would look as follows, given the files had 299792458 and 662607004 as value in that cell, respectively.

isin   value_euro
47     299792458
11     662607004      

How do I do this most efficiently in SAS?

My idea so far is to the following algorithm:

  1. Loop over all files in a directory and create a table myExcels with all names, see How to iterate through files in SAS?
  2. Loop over myExcels and import each of them with code something like
proc import
   datafile="C:\temp\isin47.xls"
   dbms=excelcs
   out=data_current_isin
   replace;
run;

proc sql;
   select Value_found_in_E_1 into: value trimmed
   from data_current_isin 
   where Value_found_in_D_1 = "Value_found_in_D_13";
quit;

%put ISIN 47 has value of &value. Euro;
  1. Collect all &value. in the table results.

This approach has the problem, that it relies on values in Excel-cells $D$1, $E$1 and $D$13 which is in general not the case, unless I write a VBA macro to populate the cells (which I want to avoid in favor of a pure SAS solution).

Looking forward to any comments or suggestions.

PS @allOutsideFinance: The ISIN uniquely defines a financial asset.

1
I just found blogs.sas.com/content/sasdummy/2018/06/21/read-excel-range which seems to allow me to import custom ranges (and thus also single cells). However, this method uses dbms=xlsx which I was told by colleagues to avoid. I am now trying to understand why I should be using dbms=excelcs in my current project.B--rian
The RANGE= statement of PROC IMPORT works with the DBMS=XLS source. What did you try? How did you specify the range?Tom
I will keep trying a bit more and show more code in the next few hours...B--rian

1 Answers

2
votes

To read a single cell from an XLS file use a RANGE of just the single cell. Make sure to also tell IMPORT not to look for variable names.

proc import datafile='c:\downloads\test_import.xls' dbms=xls
  out=cell replace
;
  getnames=NO;
  range='$B2:B2';
run;

Then it is just a matter or appending that observation to your aggregate dataset.

Might be easiest to create a macro that takes as input the file name to read.

%macro read_one(filename);
proc import datafile="&filename"  dbms=xls
  out=cell replace
;
  getnames=NO;
  range='$B2:B2';
run;
proc sql;
   insert into want(filename,value_euro)
     select  "&filename",B from cell
   ;
quit;
%mend;

Then you can use your list of files to generate calls to the macro. First setup an empty file to receive the records.

data want;
   length filename $256 value_euro 8 ;
   stop;
run;

Then use a data step and generate one call to the macro per file. You could use CALL EXECUTE for example.

data _null_;
   set filelist ;
   call execute(cats('%nrstr(%read_one)(',filename,')'));
run;