0
votes

I'm trying to modify all datasets (name of datasets follow certain orders, like data_AXPM061203900_20120104 , data_AXPM061203900_20120105, data_AXPA061204100_20120103, data_AXPA061204100_20120104) under work library. For example, I want to delete all missing value under the variable named "ask_price" in all datasets.

enter image description here

I am using the following to achieve this objective.

 proc sql ;
      create table data.mytables as
      select *
      from dictionary.tables
      where libname = 'WORK' 
      order by memname ;
    quit ;


%macro test;
  proc sql ;
    select count(memname) into: obs from data.mytables;

  %let obs=&obs.;

    select catx("_", "data", substr(memname, 6, 13), substr(memname,20,27))
    into :setname1-:setname&obs.
    from data.mytables;
quit;


%do i=1 %to &obs.;

data  &&setname&i;
set  &&setname&i;

if bid_price= '.' then delete;
%end;

%mend test;

%test;

Someone suggest that "This is possibly the least efficient (in programming terms) setup you can have. Every time you even access that data you need to go through all the loops, checks, getting the data from the filename etc. which is both resource nonsense, and prone to error." However, he didn't give me a detailed solution. In this case, could anyone give me more guidance?

2
You probably need to give some more context here. For the specific task you mentioned, this is within an order of magnitude of optimal. But the data structure itself is what's nonsense in my mind - unless it's necessary for some reason for the overall project, which there's just not sufficient context here to understand. This could probably be improved - I don't understand why you reconstruct the variable names for example, aren't they already correct? - but the general approach is fine, given the existing data structure.Joe
Sorry for the misleading question. I modified my question. Actually, I'm not trying to reconstruct the variable names. I attempt to cleaning the data through removing missing value in variables.Neal801
I mean, why do you do this? select catx("_", "data", substr(memname, 6, 13), substr(memname,20,27)) Why is that not the same as select memname?Joe
And the rest of my comment applies. I'm not saying your question is misleading, I understand what you're doing (that bit aside), but what I'm saying is that the problem is not with this specific set of code, the problem is with the data environment that makes this set of code necessary, as opposed to having your data structured in one single dataset.Joe
You stated you want to delete missing VALUES, but your code is deleting OBSERVATIONS that have a missing value for a specific variable. Is the code example doing what you want? Or did you mean something else?Tom

2 Answers

0
votes

Note that deleting observations from your existing datasets is probably a bad idea because if there is any mistake then you will have lost your original data. But let's just assume you have taken care of that with some other part of the overall process.

So if you want to remove observations from a number of existing datasets when the variable bid_price is missing then perhaps you can just use PROC SQL. You need to first generate a list of datasets. Then you need to generate a separate DELETE statement for each dataset to delete the observations.

This code will generate the code into a macro variable. So it will be limited by the maximum length of the macro variable (64K) in how many datasets it can process.

proc sql noprint ;
  create table dslist as
    select distinct 
        libname
      , memname 
    from dictionary.columns 
    where libname='WORK'
      and memname like 'DATA_APX%'
      and upcase(name)='BID_PRICE'
  ;
  %let code=;
  select catx(' ','delete * from',catx('.',libname,memname)
                 ,'where missing(bid_price)')
    into :code separated by ';'
    from dslist
  ;
  &code;
quit;
0
votes

one other problem with the original code: if bid_price= '.' then delete; bid_price is a numeric value, and you are comparing it to a character value (the period in quotes makes it a character value). If you are going to use that type of code, then it should be like this: if bid_price= . then delete; However, I prefer to use the "missing" function, which works for both character and numeric: if missing(bid_price) then delete;

There are almost always several ways to accomplish a given task in SAS.