0
votes

I'm running through a huge dset, populating a hash of unique values. At the end of the huge datastep I spit the hash contents out to a second dataset.

The huge dataset is huge enough to cause out of memory errors--specifically:

> ERROR: Hash object added nnnnnn items when memory failure occurred.
> FATAL: Insufficient memory to execute DATA step program. Aborted
> during the EXECUTION phase.

So I want to periodically write the hash contents out to a dset & then flush the hash & keep going. In order to do this I need to come up with unique names for the hash output dataset. Here's a tiny version of my code that illustrates the problem:

data huge ;
  do i = 1 to 50 ;
    recnum = ceil(i / 15) ;
    output ;
  end ;
run ;
* Write the hash out every 20 records. ;
%let chunk_size = 20 ;
data huge ;
  set huge end = alldone ;
  if _n_ = 1 then do ;
    declare hash myhash() ;
    myhash.definekey('i') ;
    myhash.definedata('i', 'y') ;
    myhash.definedone() ;
    call missing (y) ;
  end ;

  y = i * 3 ;

  myhash.ref() ;

  if mod(_n_, &chunk_size) = 0 then do ;
    call symput("chunk_num", put(_n_/&chunk_size, z2.0)) ;
    myhash.output("dataset: part&chunk_num") ;
  end ;

  if alldone then do ;
    myhash.output("dataset: part_final") ;
  end ;
run ;

The call symput() is working & the var gets created (I can symget() it out & put it in a dset variable, frx.) but I can't seem to use it in that hash output statement. The error I get is:

WARNING: Apparent symbolic reference CHUNK_NUM not resolved.

How can I use my &chunk_num macro var to name my interim hash output dataset?

2
Try using SYMGET?Reeza
The SYMGET call would need to be inside the quoted dset name, & wouldn't resolve.Roy Pardee

2 Answers

1
votes

The answer is--it was a mistake to use macro for this. I can just use a normal dataset variable, like so:

%let chunk_size = 20 ;
data huge ;
  set huge end = alldone ;
  if _n_ = 1 then do ;
    declare hash myhash() ;
    myhash.definekey('i') ;
    myhash.definedata('i', 'y') ;
    myhash.definedone() ;
    call missing (y) ;
  end ;

  y = i * 3 ;

  myhash.ref() ;

  if mod(_n_, &chunk_size) = 0 then do ;
    hash_dset_name = cats("chunk_num", put(_n_/&chunk_size, z2.0)) ;
    myhash.output(dataset: hash_dset_name) ;
    myhash.clear() ;
  end ;

  if alldone then do ;
    myhash.output(dataset: "part_final") ;
  end ;
  drop hash_dset_name ;
run ;
1
votes

You do not need to hard code the dataset name in the OUTPUT method call. You can use a variable or even an expression.

if mod(_n_, &chunk_size) = 0 then do ;
  myhash.output(dataset: cats('part',put(_n_/&chunk_size,z2.))) ;
end ;