0
votes

I recently started using SAS, only receiving a basic training that didn't cover proc sql. I'd like to read up a bit more on SAS sql when I have the time. For now, I found a solution to what I wanted to do, but I'm having difficulties understanding what is happening.

My issue started when I wanted to find out which subjects in my dataset have a certain value for all their records. I made use of my previously written snippet of code that I thought I understood. I just tried adding a couple more variables and group by statements:

data have;
    input subject:$1. myvar:1. mycount:1.;
    datalines;
    a 1 1
    a 0 2
    a 0 3
    b 1 1
    b 0 2
    b 1 3
    c 1 1
    c 1 2 /*This subject has myvar = 1 for all its observations*/
    ;
run;

*find subjects;
proc sql;
    create table want as 
/*  select*/
/*  distinct x.subject */
/*  from */
    (select distinct subject, count(myvar) as myvar_c 
    from have where myvar = 1 group by subject) x,
    (select distinct subject, max(mycount) as max_c
    from have group by subject) y
    where x.subject = y.subject and x.myvar_c = y.max_c;
quit;

When removing the commented 'select distinct x.subject from' in the create table statement, the above code works as should.

However, I've previously also created another piece of code, to select all subjects in my dataset that have two types of records:

data have2;
    input subject:$1. mytype:1.;
    datalines;
    a 1
    a 0
    a 0
    b 1
    b 0
    b 1
    c 1
    c 1 /*This subject doesn't have two types of records in all its observations*/
    ;
run;

*Find subjects;
proc sql;
   create table want2 as select
   distinct x.subject from
      have2 x, 
      (select distinct subject, count(distinct mytype) as mytype_c from have2 group by subject) y
   where y.mytype_c = 2 and x.subject = y.subject;
quit;

Which is similar, but didn't require the additional select statement. The first code has 3 select statements, the second code only requires two select statements. Can someone inform me why this is exactly required? Or link me some good documentation that lists the specifications of these types of joins - can anyone also inform me of the specific name of this type of join where you only use a comma?

while I'm writing, also see that could've used my code I initially wrote to find subjects that have only 1 type of record and tweak it for my current issue >.< but still would like to know what is happening in the first example.

1

1 Answers

0
votes

The SQL join construct

FROM ONE, TWO, THREE, …

is known as a CROSS JOIN and is a join without criteria. The comma (,) syntax is less prevalent today and the following construct is recommended

FROM ONE
CROSS JOIN TWO
CROSS JOIN THREE

The result set is a cartesian product and the number of rows is the product of the number of rows in the cross joined tables.

When the query has criteria (WHERE clause) the join is an INNER JOIN.

The SAS documentation for Proc SQL is a good starting point and includes examples.

joined-table Component
Joins a table with itself or with other tables or views.

Table of Contents
Syntax
Required Arguments
Optional Argument
Details
Types of Joins
Joining Tables
Table Limit
Specifying the Rows to Be Returned
Table Aliases
Joining a Table with Itself
Inner Joins
Outer Joins
Cross Joins
Union Joins
Natural Joins
Joining More Than Two Tables
Comparison of Joins and Subqueries

General tip:

If you want to fool around (fiddle) with SQL queries in a browser, try visiting SQL Fiddle web site.