SVC appears to treat kernels that can take sparse matrices differently from those that don't. However, if a user-provided kernel is written to take sparse matrices, and a sparse matrix is provided during fit, it still converts the sparse matrix to dense and treats the kernel as dense because the kernel is not one of the sparse kernels pre-defined in scikit-learn.
Is there a way to force SVC to recognize the kernel as sparse and not convert the sparse matrix to dense before passing it to the kernel?
Edit 1: minimal working example
As an example, if upon creation, SVC is passed the string "linear" for the kernel, then the linear kernel is used, the sparse matrices are passed directly to the linear kernel, and the support vectors are stored as sparse matrices if a sparse matrix is provided when fitting. However, if instead the linear_kernel function itself is passed to SVC, then the sparse matrices are converted to ndarray before passing to the kernel, and the support vectors are stored as ndarray.
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import linear_kernel
from sklearn.svm import SVC
def make_random_sparsemat(m, n=1024, p=.94):
"""Make mxn sparse matrix with 1-p probability of 1."""
return csr_matrix(np.random.uniform(size=(m, n)) > p, dtype=np.float64)
X = make_random_sparsemat(100)
Y = np.asarray(np.random.uniform(size=(100)) > .5, dtype=np.float64)
model1 = SVC(kernel="linear")
model1.fit(X, Y)
print("Built-in kernel:")
print("Kernel treated as sparse: {}".format(model1._sparse))
print("Type of dual coefficients: {}".format(type(model1.dual_coef_)))
print("Type of support vectors: {}".format(type(model1.support_vectors_)))
model2 = SVC(kernel=linear_kernel)
model2.fit(X, Y)
print("User-provided kernel:")
print("Kernel treated as sparse: {}".format(model2._sparse))
print("Type of dual coefficients: {}".format(type(model2.dual_coef_)))
print("Type of support vectors: {}".format(type(model2.support_vectors_)))
Output:
Built-in kernel:
Kernel treated as sparse: True
Type of dual coefficients: <class 'scipy.sparse.csr.csr_matrix'>
Type of support vectors: <class 'scipy.sparse.csr.csr_matrix'>
User-provided kernel:
Kernel treated as sparse: False
Type of dual coefficients: <type 'numpy.ndarray'>
Type of support vectors: <type 'numpy.ndarray'>