1
votes

how to import data from single excel file to multiple dataset with following conditions in SAS Progrmming.

  1. first row is data set name categries all dataset (DS) in different different DS.
  2. second row is column name for separate dataset.
  3. and third row is raw data inserted in corresponding dataset and column.

using proc import enter image description here

3
Do all of the blocks of columns have the same number of rows?Tom

3 Answers

0
votes

This isn't a good question as it stands. A good question should show your source data, the code you've tried so far, and describe the issues that you're having with getting it to work, with one or two specific questions that you need help with.

However, based on the limited information available I'll say this: you can't do all of that in a single step in PROC IMPORT. You will need to do this in stages; first, import the entire sheet into a single data set using PROC IMPORT as normal, then analyse the data set to derive the distinct set of data set names from the first column and variable names from the second column (you will need to validate these to ensure that they are valid data set / variable names), then write additional macro code to generate each of the data sets.

You might be able to use PROC TRANSPOSE to transpose the data from each row from 'tall' to 'wide' format. How does your source data represent the start of a new record in each data set? How do you know whether each column should be character, or numeric? How do you know how long each variable should be?

The source data format as described sounds pretty awful - assuming this isn't a homework question, it may be better to go back to the source of the data and ask for a better transfer.

0
votes

As @Chris-Long states you can't do what you want in one go.

The XLSX engine in SAS/Access to PCFILE supports importing from a named range, or an absolute range. Here is an example for processing the first sheet of an Excel file containing a Sheet1

filename myexcel 'c:\temp\sandbox.xlsx';

proc import 
  file=myexcel 
  dbms=XLSX
  replace
  out=Sheet1_DSName
;
  range="'Sheet1'$A1:A1";  /* Upper left cell contains eventual SAS data set name */
  getnames=no;
run;

proc import 
  file=myexcel 
  dbms=XLSX
  replace
  out=Sheet1_Data
;
  range="'Sheet1'$A2:Z100";  /* read second row to hundredth row as headered data */
                             /* getnames=yes is default */
run;

proc sql noprint;
  select A into :name from Sheet1_DSName;

proc datasets nolist lib=work;
  age Sheet1_Data &name;  /* rename the data data set to name as found in A1 */
run;
0
votes

You basically need to read the XLSX file twice. The first read will get the data in the first two rows and then the second read can skip those rows. You can then use the data from the first two rows to understand how to split the data into individual datasets.

First let's make a test XLSX file.

data sample;
 input (x1-x9) (:$32.);
cards;
ds1 ds1 ds1 ds2 ds2 ds2 ds3 ds3 ds3
id gender name department department_name employees name category division
1 e q a w 23 q t oi
2 e q a x 24 q y uy
;

proc export data=sample dbms=xlsx file="&path\sample.xlsx" replace ;
  putnames=no;
run;

Now let's read the whole thing in without names and get just the first two rows transposed so we can see how many columns there are.

proc import datafile="&path\sample.xlsx" dbms=xlsx 
  out=raw replace ;
  getnames=no;
run;

proc transpose data=raw(obs=2) out=names ;
  var _all_;
run;

Then we can use the number of rows and number of columns to generate a RANGE to use to read the real data. This is important as it should allow the numeric columns to come over as numeric variables since IMPORT will not be trying to include the column names in the data.

data _null_;
   set names (keep=_name_) point=ncols nobs=ncols;
   set raw (drop=_all_) nobs=nrows;
   call symputx('range',cats('$A3:',_name_,nrows));
   stop;
run;
%put &=range;

proc import datafile="&path\sample.xlsx" dbms=xlsx 
  out=values replace ;
  range="&range";
  getnames=no;
run;

Now we can use the information on the names to generate code to split the big dataset into little datasets. It looked like you wanted to keep the first column on all of the datasets so let's include that also.

filename code temp;
data _null_;
  file code;
  if _n_=1 then set names(keep=_name_ col2 rename=(_name_=idcol col2=idvar));
  do until (last.col1);
    set names ;
    by col1 notsorted ;
    if first.col1 then put 'data ' col1 ';' / '  set values(keep=' idcol _name_ '--' @;
    if last.col1 then put _name_ ');' / '  rename ' idcol '=' idvar ';' ;
  end;
  put '  rename';
  do until (last.col1);
    set names ;
    by col1 notsorted ;
    put '    ' _name_ '=' col2  ;
  end;
  put '  ;' / 'run;' ;
run;
%include code / source2;