4
votes

Background

I'm working on a project involving solving large underdetermined systems of equations.

My current algorithm calculates SVD (numpy.linalg.svd) of a matrix representing the given system, then uses its results to calculate the Moore-Penrose pseudoinverse and the right nullspace of the matrix. I use the nullspace to find all variables with unique solutions, and the pseudo-inverse to find out it's value.

However, the MPP (Moore Penrose pseudo-inverse) is quite dense and is a bit too large for my server to handle.

Problem

I found the following paper which details a sparser pseudoinverse that maintains most of the essential properties of the MPP. This is obviously of much interest to me, but I simply don't have the math background to understand how he's calculating the pseudoinverse. Is it possible to calculate it with SVD? If not, what's the best way to go about it?

Details

These are the lines of the paper which I think are probably relevant but I'm not antiquated enough to understand

  • spinv(A) = arg min ||B|| subject to BA = In where ||B|| denotes the entrywise l1 norm of B

  • This is in general a non-tractable problem, so we use the standard linear relaxation with the l1 norm

  • sspinv(A) = ητ {[spinv(A)]}, with ητ (u) = u1|u|≥τ

Edit

Find my code and more details on the actual implementation here

1
This question is too broad. Do you have a specific question with accompanying code? Perhaps this is better for math.stackexchange?erip
I can tag a question which has all my code in itDon Thousand
This question got closed on math.stackexchange for being too CS related, so I got sent hereDon Thousand

1 Answers

2
votes

As I understand, here's what the paper says about sparse-pseudoinverse:

It says

We aim at minimizing the number of non-zeros in spinv(A)

This means you should take the L0 norm (see David Donoho's definition here: the number of non-zero entries), which makes the problem intractable.

spinv(A) = argmin ||B||_0 subject to B.A = I

So they turn to convex relaxation of this problem so it can be solved by linear-programming.

This is in general a non-tractable problem, so we use the standard linear relaxation with the `1 norm.

The relaxed problem is then

spinv(A) = argmin ||B||_1 subject to B.A = I (6)

This is sometimes called Basis pursuit and tends to produce sparse solutions (see Convex Optimization by Boyd and Vandenberghe, section 6.2 Least-norm problems).

So, solve this relaxed problem.

The linear program (6) is separable and can be solved by computing one row of B at a time

So, you can solve a series of problems of the form below to obtain the solution.

spinv(A)_i = argmin ||B_i||_1 subject to B_i.A = I_i

where _i denotes the ith row of the matrix.

See here to see how to convert this absolute value problem to a linear program.

In the code below, I slightly alter the problem to spinv(A)_i = argmin ||B_i||_1 subject to A.B_i = I_i where _i is the ith column of the matrix, so the problem becomes spinv(A) = argmin ||B||_1 subject to A.B = I. Honestly, I don't know if there's a difference between the two. Here I'm using scipy's linprog simplex method. I don't know the internals of simplex to say if it uses SVD.

import numpy as np
from scipy import optimize

# argmin ||B_i||_1 stubect to A.B_i = I_i, where _i is the ith column
# let B_i = u_i - v_i where u_i >= 0 and v_i >= 0
# then ||B_i||_1 = [1' 1'][u_i;v_i] which is the objective function
# and A.B_i = I_i becomes
# A.[u_i - v_i] = I_i
# [A -A][u_i;v_i] = I_i which is the equality constraint
# and [u_i;v_i] >= 0 the bounds
# here A is n x m (as opposed to m x n in paper)

A = np.random.randn(4, 6)
n, m = A.shape
I = np.eye(n)

Aeq = np.hstack((A, -A))
# objective
c = np.ones((2*m))
# spinv
B = np.zeros((m, n))

for i in range(n):
    beq = I[:, i]
    result = optimize.linprog(c, A_eq=Aeq, b_eq=beq)
    x = result.x[0:m]-result.x[m:2*m]
    B[:, i] = x

print('spinv(A) = \n' + str(B))
print('pinv(A) = \n' + str(np.linalg.pinv(A)))
print('A.B = \n' + str(np.dot(A, B)))

Here's an output. spinv(A) is sparser than pinv(A).

spinv(A) = 
[[ 0.         -0.33361925  0.          0.        ]
 [ 0.04987467  0.          0.12741509  0.02897778]
 [ 0.          0.         -0.52306324  0.        ]
 [ 0.43848257  0.12114828  0.15678815 -0.19302049]
 [-0.16814546  0.02911103 -0.41089271  0.50785258]
 [-0.05696924  0.13391736  0.         -0.43858428]]
pinv(A) = 
[[ 0.05626402 -0.1478497   0.19953692 -0.19719524]
 [ 0.04007696 -0.07330993  0.19903311  0.14704798]
 [ 0.01177361 -0.05761487 -0.23074996  0.15597663]
 [ 0.44471989  0.13849828  0.18733242 -0.20824972]
 [-0.1273604   0.15615595 -0.24647117  0.38047901]
 [-0.04638221  0.09879972  0.21951122 -0.33244635]]
A.B = 
[[ 1.00000000e+00 -1.82225048e-17  6.73349443e-18 -2.39383542e-17]
 [-5.20584593e-18  1.00000000e+00 -3.70118759e-16 -1.62063433e-15]
 [-8.83342417e-18 -5.80049814e-16  1.00000000e+00  3.56175852e-15]
 [ 2.31629738e-17 -1.13459832e-15 -2.28503999e-16  1.00000000e+00]]

To further sparsify the matrix, we can apply entrywise hard thresholding, thus sacrificing the inverting property and computing an approximate sparse pseudoinverse

If you don't want to keep small entries in the sparse-pinv, you can remove them like this:

Bt = B.copy()
Bt[np.abs(Bt) < 0.1] = 0
print('sspinv_0.1(A) = \n' + str(Bt))
print('A.Bt = \n' + str(np.dot(A, Bt)))

to get

sspinv_0.1(A) = 
[[ 0.         -0.33361925  0.          0.        ]
 [ 0.          0.          0.12741509  0.        ]
 [ 0.          0.         -0.52306324  0.        ]
 [ 0.43848257  0.12114828  0.15678815 -0.19302049]
 [-0.16814546  0.         -0.41089271  0.50785258]
 [ 0.          0.13391736  0.         -0.43858428]]
A.Bt = 
[[ 9.22717491e-01  1.17555372e-02  6.73349443e-18 -1.10993934e-03]
 [ 1.24361576e-01  9.41538212e-01 -3.70118759e-16  1.15028494e-02]
 [-8.76662313e-02 -1.36349311e-02  1.00000000e+00 -7.48302663e-02]
 [-1.54387852e-01 -3.27969169e-02 -2.28503999e-16  9.39161039e-01]]

Hope I answered your question(s) and provided enough references if you want further details. Let me know if you have any questions. I'm no expert, so you can always ask the experts in math stackexchange (without any code of course) if you have any doubts about my claims, and please let me know about them.

This is an interesting question. It enabled me brush-up my linear-algebra and little of optimization I know, so thank you :)