0
votes

We are in the process of transitioning our code to the SAS Grid where we use Enterprise Guide 6.1 to run SAS 9.4. One piece of code is giving different results than what we get when running SAS 9.3 on PC. I've created an example dataset and code which returns varying results. The results of running on the Grid (13 rows with duplicates) are the expected behavior when including a column in the select statement, but not aggregating it or using it in the group by. The results of running it using SAS 9.3 (6 distinct rows) are what we want, but not typical SAS behavior. I have since modified the code (added denom to the group by) to get the desired results on the Grid, but wonder why the code returns different results when run in different environments. Any ideas?

ETA: the Grid uses SAS 9.4 and the modified code uses denom in the group by. Also, I was unable to find documentation of changes to Proc SQL from V 9.3 to V 9.4.

proc sql;
  create table work.test
    (state char(2)
    ,county char(20)
    ,city char(20)
    ,id char(6));    
quit;

proc sql;
  insert into work.test (state, county, city, id)
  values ('OH', 'Hamilton', 'Cincinnati', 'abc')
  values ('OH', 'Hamilton', 'Cincinnati', 'def')
  values ('OH', 'Hamilton', 'Cincinnati', 'ghi')
  values ('OH', 'Hamilton', 'Mariemont', 'jkl')
  values ('OH', 'Hamilton', 'Mariemont', 'mno')
  values ('OH', 'Franklin', 'Columbus', 'pqr')
  values ('OH', 'Franklin', 'Columbus', 'stu')
  values ('TX', 'San Patricio', 'Ingleside', 'abc')
  values ('TX', 'San Patricio', 'Taft', 'abc')
  values ('TX', 'Nueces', 'Corpus Christi', 'abc')
  values ('TX', 'Nueces', 'Corpus Christi', 'xyz')
  values ('TX', 'Nueces', 'Corpus Christi', 'tuv')
  values ('TX', 'Nueces', 'Corpus Christi', 'def');
quit;

proc sql;
  create table freqs as
  select a.state
    , a.county
    , a.city
    , count(city) as numer
    , denom
    , round(count(city)/denom*100,.1) as percent
  from work.test as a,
  (select state, county, count(*) as denom from work.test group by state, county) as b
  where a.state=b.state and a.county=b.county
  group by a.state, a.county, a.city;
quit;
1
Raise a ticket with SAS support.Tom
See the edit to my answer. Looks like a known issue that was fixed in a 9.3 Hot Fix: support.sas.com/kb/46/832.htmlDomPazz
Note that SAS 9.2 does the same thing as SAS 9.3. It is SAS 9.4 that has lost the knowledge the DENOM is at the STATE,CITY level and so does NOT need to be remerged.Tom
I wouldn't say it "lost the knowledge", 9.4 operates how the ANSI standard would have it work. They made the procedure more correct.DomPazz

1 Answers

1
votes

At the end of the day, your query is poorly written.

If you look at the log, you will see this note:

NOTE: The query requires remerging summary statistics back with the original data.

The reason for this is that denom is not a summary variable. This is causing your extra records. This should be expected behavior of your query - 13 rows with dups.

Order of processing is to join the tables, and then perform summary calculations. When the GROUP BY statement is not complete, you get that NOTE:

Add b.denom to the group by and you get your expected 6 rows.

As to the WHY SAS gives you different results: I suspect the version of SAS on the GRID is different than what you have on the PC.

EDIT: I think this is the SAS Note that details the problem. They fixed it in a hot fix in 9.3 and would have been part of 9.4 out of the box: http://support.sas.com/kb/46/832.html