Suppose I have a directed graph represented in a dataset named links
, which has two variables: from_id
and to_id
. I want to use SAS Data Step to do two things: (1) count the number of nodes, and (2) count the number of edges.
Suppose the links
dataset is as shown below.
from_id to_id
----------------
1 2
2 3
3 1
3 2
In this example, there are 3 nodes and 4 edges. (We can assume there are no duplicate edges in links
). The nodes are 1, 2, and 3. The edges are 1->2, 2->3, 3->1, and 3->2.
Below is a SAS Macro that uses SAS Data Step in conjunction with proc sql in order to count the nodes and edges. It works perfectly, but I wish to use SAS Data Step so that counting the nodes and edges may (potentially) be done faster.
/* display number of nodes and edges for graph */
%macro graph_info(links);
data nodes;
set &links;
node_id = from_id;
output;
node_id = to_id;
output;
keep node_id;
run;
proc sql noprint;
select count(distinct node_id) into :numNodes from nodes;
quit;
proc datasets lib=work nolist;
delete nodes;
quit;
proc sql noprint;
select count(*) into :numEdges from &links;
quit;
%put Nodes: &numNodes;
%put Edges: &numEdges;
%mend;