0
votes

I am trying to summarize my data set using the proc sql, but I have repeated values in the output, a simple version of my code is:

PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT  patid,ndc,fill_mon,
COUNT(dea) AS n_dea, 
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx 
GROUP BY patid,ndc,fill_mon;
QUIT;

Some sample output is:

 Obs                  Patid    Ndc            FILL_mon    n_dea    DEDUCT

 3815            33003605204    00054465029    2000-05       2         0  
3816            33003605204    00054465029    2000-05       2         0  
12257            33004361450    00406035701    2000-06       2         0  
16564            33004744098    00603128458    2000-05       2         0  
16565            33004744098    00603128458    2000-05       2         0  
16566            33004744098    00603128458    2000-06       2         0  
16567            33004744098    00603128458    2000-06       2         0  
46380            33008165116    00406035705    2000-06       2         0  
85179            33013674758    00406035801    2000-05       2         0  
89248            33014228307    00054465029    2000-05       2         0  
107514            33016949900    00406035805    2000-06       2         0  
135047            33056226897    63481062370    2000-05       2         0  
213691            33065594501    00472141916    2000-05       2         0  
215192            33065657835    63481062370    2000-06       2         0  
242848            33066899581    60432024516    2000-06       2         0  

As you can see there are repeated out put, for example obs 3815,3816. I have saw some people had similar problem, but the answers didn't work for me.

The content of the dataset is this:

                            The SAS System                               5
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

                  Engine/Host Dependent Information

     Data Set Page Size          65536                           
     Number of Data Set Pages    210                             
     First Data Page             1                               
     Max Obs per Page            1360                            
     Obs in First Data Page      1310                            
     Number of Data Set Repairs  0                               
     Filename                    /home/zahram/optum/rx_4.sas7bdat
     Release Created             9.0401M2                        
     Host Created                Linux                           
     Inode Number                424673574                       
     Access Permission           rw-r-----                       
     Owner Name                  zahram                          
     File Size (bytes)           13828096                        


                            The SAS System                               6
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

              Alphabetic List of Variables and Attributes

  #    Variable      Type    Len    Format    Informat    Label

  3    FILL_mon      Num       8    YYMMD.                Fill month
  2    Ndc           Char     11    $11.      $20.        Ndc       
  1    Patid         Num       8    19.                   Patid     
  4    n_dea         Num       8                                    
  5    tot_DEDUCT    Num       8                                    


                          Sort Information

                  Sortedby       Patid Ndc FILL_mon
                  Validated      YES               
                  Character Set  ASCII             


                            The SAS System                               7
                                          17:01 Thursday, December 3, 2015

                        The CONTENTS Procedure

                          Sort Information

                  Sort Option    NODUPKEY          

NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.08 seconds cpu time 0.01 seconds

1
what happens if you remove the DISTINCT from the SELECT clause?DomPazz
Double check that your code is actually executing (ie. you don't have the table open in the background). While the distinct clause has no place in being there it shouldn't affect the result.Robert Penridge
Are you getting warnings or notes about re-merging?Joe
Post the results from a proc contents. I'll guess that you have a format on a variable, most likely the date. Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates.Reeza
Actually the Distinct doesn't change anything in the answer, I run the code without it first.maryam

1 Answers

1
votes

I'll guess that you have a format on a variable, most likely the date. Proc SQL does not aggregate over formatted values but will use the underlying values but still shows them as formatted, so they appear as duplicates. Your proc contents confirms this. You can get around this by converting this the variable to a character variable.

PROC SQL;
CREATE TABLE perm.rx_4 AS
SELECT  patid,ndc, put(fill_mon, yymmd.) as fill_month, 
COUNT(dea) AS n_dea, 
sum(DEDUCT) AS tot_DEDUCT
FROM perm.rx 
GROUP BY patid,ndc, calculated fill_month;
QUIT;