12
votes

I have written a recursive function, however, it takes a lot of time. Hence I vectorized it, but it does not yield the same result as the recursive function. This is my non-vectorized code:

function visited = procedure_explore( u, adj_mat, visited )
visited(u) = 1;
neighbours = find(adj_mat(u,:));
for ii = 1:length(neighbours)
    if (visited(neighbours(ii)) == 0)
        visited = procedure_explore( neighbours(ii), adj_mat, visited );
    end
end
end

This is my vectorized code:

function visited = procedure_explore_vec( u, adj_mat, visited )
visited(u) = 1;
neighbours = find(adj_mat(u,:));
len_neighbours=length(neighbours);
visited_neighbours_zero=visited(neighbours(1:len_neighbours)) == 0;
if(~isempty(visited_neighbours_zero))
    visited = procedure_explore_vec( neighbours(visited_neighbours_zero), adj_mat, visited );
end
end

This is the test code

function main
    adj_mat=[0 0 0 0;
             1 0 1 1;
             1 0 0 0;
             1 0 0 1];
    u=2;
    visited=zeros(size(adj_mat,1));
    tic
    visited = procedure_explore( u, adj_mat, visited )
    toc
    visited=zeros(size(adj_mat,1));
    tic
    visited = procedure_explore_vec( u, adj_mat, visited )
    toc
end

This is the algorithm I'm trying to implement: enter image description here

If vectorization is impossible, a mex solution would also be good.

Update benchmark: This benchmark is based on MATLAB 2017a. It shows that the original code is faster than other methods

Speed up between original and logical methods is 0.39672
Speed up between original and nearest methods is 0.0042583

Full code

function main_recersive
    adj_mat=[0 0 0 0;
             1 0 1 1;
             1 0 0 0;
             1 0 0 1];
    u=2;
    visited=zeros(size(adj_mat,1));
    f_original=@()(procedure_explore( u, adj_mat, visited ));
    t_original=timeit(f_original);

    f_logical=@()(procedure_explore_logical( u, adj_mat ));
    t_logical=timeit(f_logical);

    f_nearest=@()(procedure_explore_nearest( u, adj_mat,visited ));
    t_nearest=timeit(f_nearest);

    disp(['Speed up between original and logical methods is ',num2str(t_original/t_logical)])
    disp(['Speed up between original and nearest methods is ',num2str(t_original/t_nearest)])    

end

function visited = procedure_explore( u, adj_mat, visited )
    visited(u) = 1;
    neighbours = find(adj_mat(u,:));
    for ii = 1:length(neighbours)
        if (visited(neighbours(ii)) == 0)
            visited = procedure_explore( neighbours(ii), adj_mat, visited );
        end
    end
end

function visited = procedure_explore_nearest( u, adj_mat, visited )
    % add u since your function also includes it.
    nodeIDs = [nearest(digraph(adj_mat),u,inf) ; u];
    % transform to output format of your function
    visited = zeros(size(adj_mat,1));
    visited(nodeIDs) = 1;

end 

function visited = procedure_explore_logical( u, adj_mat )
   visited = false(1, size(adj_mat, 1));
   visited(u) = true;
   new_visited = visited;
   while any(new_visited)
      visited = any([visited; new_visited], 1);
      new_visited = any(adj_mat(new_visited, :), 1);
      new_visited = and(new_visited, ~visited);
   end
end
6
Well , your function is not actually able to operate on multiple inputs in the second version , so passing in a vector does not magically vectorize it.Mad Physicist
But excellent question in terms of lucid description and complete minimal example. I'll try to think of something on my way to a computer.Mad Physicist
Thanks, Mad Physicist. In case of vectorizing is impossible, I also will accept the mex version code.Jame
No problem. I will leave the mexing up to you if I can't think of anything. After all, vectorization is basically the same as mexing. You're just delegating the loop to a faster implementation of the same thing.Mad Physicist
If your matrices are going to be 4x4, your benchmarks are fine. If you're going to be using larger graphs, then you should use larger test matrices.beaker

6 Answers

4
votes

Here's a fun little function that does a non-recursive breadth-first search on the graph.

function visited = procedure_explore_logical( u, adj_mat )
   visited = false(1, size(adj_mat, 1));
   visited(u) = true;
   new_visited = visited;

   while any(new_visited)
      visited = any([visited; new_visited], 1);
      new_visited = any(adj_mat(new_visited, :), 1);
      new_visited = and(new_visited, ~visited);
   end
end

In Octave, this runs about 50 times faster than your recursive version on a 100x100 adjacency matrix. You'll have to benchmark it on MATLAB to see what you get.

2
votes

You can think of your adjacency matrix as a list of paths of length exactly one. You can generate paths of other lengths n by taking it to the n-th power up to the rank of your matrix. (adj_mat^0 is the identity matrix)

In a graph with n nodes, the longest path cannot be longer than n-1, therefore you can sum all the powers together for a reachability analysis:

adj_mat + adj_mat^2 + adj_mat^3
ans =
   0   0   0   0
   4   0   1   3
   1   0   0   0
   3   0   0   3

This is the number of (different) ways that you can use to go from one node to another. For simple reachablitity, check whether this value is greater than zero:

visited(v) = ans(v, :) > 0;

Depending on your definition, you may have to change columns and rows in the result (i.e. take ans(:, v)).

For performance, you can use the lower powers to make higher ones. For example something like A + A^2 + A^3 + A^4 + A^5 would be efficiently calculated:

A2 = A^2;
A3 = A2*A
A4 = A2^2;
A5 = A4*A;
allWalks= A + A2 + A3 + A4 + A5;

Note: If you want to include the original node as being reachable from itself, you should include the identity matrix in the sum.

This minimizes the number of matrix multiplications, also MATLAB will likely execute a matrix square faster than a regular multiplication.

In my experience, matrix multiplication is relatively fast in MATLAB and this will yield the result (reachability) vector for all the nodes in the graph at once. If you are only interested in a small subset of a large graph, this is probably not the best solution.

See also this answer: https://stackoverflow.com/a/7276595/1974021

1
votes

I don't think you can properly vectorise your function: Your original function never reaches the same node multiple times. By vectorising it you would pass all directly connected nodes at the same time to the next function. Therefore it then would be possible that in the following instances the same node gets reached multiple times. E.g. in your example node 1 would be reached 3 times. So while you would no longer have a loop, the function might, depending on your network, be called more times recursively which would increase the computational time.

That being said, it is not generally impossible to find all reachable nodes without loops or recursive calls. For example you could check all (valid or invalid) paths. But that would work very different from your function and, depending on the number of nodes, might result in a performance loss due to the staggering amount of paths to be checked. Your current function isn't too bad and will scale well with large networks.

A bit offtopic, but since Matlab 2016a you can use nearest() to find all reachable nodes (without the starting node). It invokes a breadth-first algorithm in contrast to your depth-first algorithm:

% add u since your function also includes it.
nodeIDs = [nearest(digraph(adj_mat),u,inf) ; u]; 

% transform to output format of your function
visited = zeros(size(adj_mat,1));
visited(nodeIDs) = 1;

If this is for a students project, you could argue that while your function works you used the built-in function for performance reasons.

1
votes

The problem with the recursive function is related to visited(u) = 1;. That is because MATLAB uses copy-on-write technique to pass/assign variables. If you don't change visited in the body of function no copy of it is made but when it is modified a copy of it is created and modification is applied on a copy of it. To prevent that you can use a handle object to be passed by reference to the function.

Define a handle class (save it to visited_class.m):

classdef visited_class < handle
    properties
        visited
    end
    methods
        function obj = visited_class(adj_mat)
            obj.visited = zeros(1, size(adj_mat,1));
        end
    end
end

The recursive function:

function procedure_explore_handle( u, adj_mat,visited_handle )
    visited_handle.visited(u) = 1;
    neighbours = find(adj_mat(u,:));
    for n = neighbours
        if (visited_handle.visited(n) == 0)
            procedure_explore_handle( n, adj_mat , visited_handle );
        end
    end
end

Initialize variables:

adj_mat=[0 0 0 0;
         1 0 1 1;
         1 0 0 0;
         1 0 0 1];
visited_handle = visited_class(adj_mat);
u = 2;

Call it as :

procedure_explore_handle( u, adj_mat,visited_handle );

The result is saved into visited_handle:

disp(visited_handle.visited)
0
votes

If you want to go from one point in the graph to another, the most efficient way to find it in terms of resources is Dijkstra's algorithm. The Floyd-Warshall algorithm computes all the distances between all points and can be parallellised (by starting from multiple points).

Why the need to vectorise (or use mex programming)? If you just want to make the most use of Matlab's fast matrix multiplication routines then using products of A should get you there quickly:

adj_mat2=adj_mat^2;               % allowed to use 2 steps
while (adj_mat2 ~= adj_mat)       % check if new points were reached
      adj_mat=adj_mat2;           % current set of reachable points
      adj_mat2=(adj_mat^2)>0;     % allowed all steps again: power method
end
0
votes

This answer just gives an explicit, vectorized implementation of the suggestion from DasKrümelmonster's answer, which I think is faster than the code in the question (at least if the dimensions of the matrix are not too big). It uses the polyvalm function to evaluate the sum of powers of the adjacency matrix.

function visited = procedure_explore_vec(u, adj_mat)
    connectivity_matrix = polyvalm(ones(size(adj_mat,1),1),adj_mat)>0;

    visited = connectivity_matrix(u,:);
end