How to parallelize for-cycle in the attached fuction by parfor-cycle to get additional speedup in matlab?
Thanks in advance for any tips?
function p = permryser_fast( A )
% Computes the permament of square matrix, A
% The matrix permanent is defined like the matrix determinant, but
% without the sign terms.
mlock % locks the current M-file in memory
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size
[m n]=size(A);
if (m ~= n),
error('Must be a square matrix. perm_ryser_fast() %d %d\n',m,n)
end
if isempty(ps)
pn=-1;
end
if (n~=pn)&&(n>1),
fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
pn=n;
x=1:(2^n -1); % count (assumes n<=52)
y=bitxor(x,bitshift(x,-1)); % gray-coded count
pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
% computes which position comes in/out in gray-count
ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
% computes whether its in (+1) or out (-1)
end
if (n == 1),
p = A;
else
p = 0; % running permanent accumulator
rs = zeros(1,n); % running row sums vector
% ==============================================
% Loop over all 2^n subsets of {1,...,n}
for i=1:(2^n -1) % Just skipping the null subset
rs = rs + ps(i) * A(pz(i),:);
p = p + (-1)^i * prod(rs);
end
% ==============================================
p = p * (-1)^m;
end
return
Example:
n = 20;
A = ones(n);
permanent = permryser_fast(A)
In this case the permanent should be equal factorial(n).
Remarks:
vectorization is able to greatly improve computation speed, but memory requirements are so terrible, that realistic sizes of matrices are up to 25 x 25
probably the only way is parallelization via parfor-loop after proper modification of recursion for-loop.
My final goal is to be able compute permanents of non-negative matrices with size from 25x25 to 35x35 in reasonable amount of time and accuracy.
for-loop here is non-parallelisable, because it is recursive (you changersbased upon its value in the previous iteration).parfordoes not allow you to do that. - Adriaan