As I understand, here's what the paper says about sparse-pseudoinverse:
It says
We aim at minimizing the number of non-zeros in spinv(A)
This means you should take the L0 norm (see David Donoho's definition here: the number of non-zero entries), which makes the problem intractable.
spinv(A) = argmin ||B||_0 subject to B.A = I
So they turn to convex relaxation of this problem so it can be solved by linear-programming.
This is in general a non-tractable problem, so we use the standard
linear relaxation with the `1 norm.
The relaxed problem is then
spinv(A) = argmin ||B||_1 subject to B.A = I (6)
This is sometimes called Basis pursuit and tends to produce sparse solutions (see Convex Optimization by Boyd and Vandenberghe, section 6.2 Least-norm problems).
So, solve this relaxed problem.
The linear program (6) is separable and can be solved by computing one
row of B at a time
So, you can solve a series of problems of the form below to obtain the solution.
spinv(A)_i = argmin ||B_i||_1 subject to B_i.A = I_i
where _i
denotes the ith row of the matrix.
See here to see how to convert this absolute value problem to a linear program.
In the code below, I slightly alter the problem to spinv(A)_i = argmin ||B_i||_1 subject to A.B_i = I_i
where _i
is the ith column of the matrix, so the problem becomes spinv(A) = argmin ||B||_1 subject to A.B = I
. Honestly, I don't know if there's a difference between the two. Here I'm using scipy's linprog
simplex method. I don't know the internals of simplex to say if it uses SVD.
import numpy as np
from scipy import optimize
# argmin ||B_i||_1 stubect to A.B_i = I_i, where _i is the ith column
# let B_i = u_i - v_i where u_i >= 0 and v_i >= 0
# then ||B_i||_1 = [1' 1'][u_i;v_i] which is the objective function
# and A.B_i = I_i becomes
# A.[u_i - v_i] = I_i
# [A -A][u_i;v_i] = I_i which is the equality constraint
# and [u_i;v_i] >= 0 the bounds
# here A is n x m (as opposed to m x n in paper)
A = np.random.randn(4, 6)
n, m = A.shape
I = np.eye(n)
Aeq = np.hstack((A, -A))
# objective
c = np.ones((2*m))
# spinv
B = np.zeros((m, n))
for i in range(n):
beq = I[:, i]
result = optimize.linprog(c, A_eq=Aeq, b_eq=beq)
x = result.x[0:m]-result.x[m:2*m]
B[:, i] = x
print('spinv(A) = \n' + str(B))
print('pinv(A) = \n' + str(np.linalg.pinv(A)))
print('A.B = \n' + str(np.dot(A, B)))
Here's an output. spinv(A)
is sparser than pinv(A)
.
spinv(A) =
[[ 0. -0.33361925 0. 0. ]
[ 0.04987467 0. 0.12741509 0.02897778]
[ 0. 0. -0.52306324 0. ]
[ 0.43848257 0.12114828 0.15678815 -0.19302049]
[-0.16814546 0.02911103 -0.41089271 0.50785258]
[-0.05696924 0.13391736 0. -0.43858428]]
pinv(A) =
[[ 0.05626402 -0.1478497 0.19953692 -0.19719524]
[ 0.04007696 -0.07330993 0.19903311 0.14704798]
[ 0.01177361 -0.05761487 -0.23074996 0.15597663]
[ 0.44471989 0.13849828 0.18733242 -0.20824972]
[-0.1273604 0.15615595 -0.24647117 0.38047901]
[-0.04638221 0.09879972 0.21951122 -0.33244635]]
A.B =
[[ 1.00000000e+00 -1.82225048e-17 6.73349443e-18 -2.39383542e-17]
[-5.20584593e-18 1.00000000e+00 -3.70118759e-16 -1.62063433e-15]
[-8.83342417e-18 -5.80049814e-16 1.00000000e+00 3.56175852e-15]
[ 2.31629738e-17 -1.13459832e-15 -2.28503999e-16 1.00000000e+00]]
To further sparsify the matrix, we can apply entrywise hard
thresholding, thus sacrificing the inverting property and computing an
approximate sparse pseudoinverse
If you don't want to keep small entries in the sparse-pinv, you can remove them like this:
Bt = B.copy()
Bt[np.abs(Bt) < 0.1] = 0
print('sspinv_0.1(A) = \n' + str(Bt))
print('A.Bt = \n' + str(np.dot(A, Bt)))
to get
sspinv_0.1(A) =
[[ 0. -0.33361925 0. 0. ]
[ 0. 0. 0.12741509 0. ]
[ 0. 0. -0.52306324 0. ]
[ 0.43848257 0.12114828 0.15678815 -0.19302049]
[-0.16814546 0. -0.41089271 0.50785258]
[ 0. 0.13391736 0. -0.43858428]]
A.Bt =
[[ 9.22717491e-01 1.17555372e-02 6.73349443e-18 -1.10993934e-03]
[ 1.24361576e-01 9.41538212e-01 -3.70118759e-16 1.15028494e-02]
[-8.76662313e-02 -1.36349311e-02 1.00000000e+00 -7.48302663e-02]
[-1.54387852e-01 -3.27969169e-02 -2.28503999e-16 9.39161039e-01]]
Hope I answered your question(s) and provided enough references if you want further details. Let me know if you have any questions. I'm no expert, so you can always ask the experts in math stackexchange (without any code of course) if you have any doubts about my claims, and please let me know about them.
This is an interesting question. It enabled me brush-up my linear-algebra and little of optimization I know, so thank you :)