0
votes

I am working with a panel dataset, so many countries and many variables throughout a period. The problem is that some countries have no value for certain variables across the whole period and I would like to get rid of them. I found this code for deleting rows with missing values :

DATA data0;
 SET data1;
 IF cmiss(of _all_) then delete;
RUN;

But all this does is check every row, while I would like to delete a whole country if it has no observations in at least one variable. Here's a part of the data : enter image description here

2
What variable stores the county identifier?Tom
Just country names, so you'd have 60 rows (60 years) for each countrySaad Cherkaoui Ikbal
I really cannot picture your data from what you have shown so far. Please post example data for a few counties and a few years, preferable as a data step that creates the data.Tom
How can I show a part of the data from sas ^^' ?Saad Cherkaoui Ikbal
There's the data if that helpsSaad Cherkaoui Ikbal

2 Answers

0
votes

If you want to delete the whole country if it has any information missing, you are on the right track, you just need to add a (group) by statement.

If your data is already sorted by country, as it appears to be in the picture, you can just run:

data want;
set have;
IF cmiss(of _all_) then delete;
by country;

If it is not sorted, you need to first run:

proc sort data=have;
by country;

However, if you have 60 years of data for every country, my guess is that you will not find a single one that have all the information for every year. It will be probably better to do some substantive choices of countries and periods you want to analyze, and then perform multiple imputatiom of missing data: https://support.sas.com/rnd/app/stat/papers/multipleimputation.pdf

0
votes

You can use a DOW loop to compute which variable(s) contain only missing values within a group. A second DOW loop outputs only those groups in which all variables contain at least on value.

Example:

data have;
  call streaminit (2020);

  do country = 1 to 6;
    do year = 1960 to 1999;

      array x gini kof tradegdp fdi gdp age_dep educ;

      do over x;
        x = rand('integer', 20, 100);
      end;

      if country = 1 then call missing (gini);
      if country = 2 then call missing (educ);
      if country = 4 then call missing (fdi);

      output;

    end;
  end;
run;

data want;
  * count number of non-missing values over group for each arrayed variable;

  do _n_ = 1 by 1 until (last.country);
    set have;
    by country;

    array x gini kof tradegdp fdi gdp age_dep educ;
    array flag(100) _temporary_; * flag if variable has a non-missing value in group;

    do _index = 1 to dim(x);
      if not(flag(_index)) then flag(_index) = 1 - missing(x(_index));
    end;
  end;

  * check if at least one variable has no values;

  _remove_group_flag = sum(of flag(*)) ne dim(x);

  do _n_ = 1 to _n_;
    set have;

    if not _remove_group_flag then output;
  end;

  call missing (of flag(*));
run;

Will LOG

NOTE: There were 240 observations read from the data set WORK.HAVE.   First DOW loop
NOTE: There were 240 observations read from the data set WORK.HAVE.   Second DOW loop
NOTE: The data set WORK.WANT has 120 observations and 11 variables.   Conditional output