1
votes

I am trying to run a code that utilizes UMAP for dimensionality reduction based on the work here: https://umap-learn.readthedocs.io/en/latest/basic_usage.html

I am running on Spyder (Python 3.7). I get this error: TypeError: a bytes-like object is required, not 'list'

This is my code:


import numpy as np
from sklearn.datasets import load_iris, load_digits
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

import umap.umap_ as umap

# import umap

from sklearn.preprocessing import StandardScaler
from matplotlib import patches


#data = pd.read_csv('/Users/chrisweber/Documents/Walsh Lab/UMAP.csv')

data = pd.read_csv('/Users/Elizabeth/Desktop/UMAPtest.csv')


# Uncomment one section or the other


################################### SECTION 1 - UMAP ###################################################################
reducer = umap.UMAP()
endpoints = data[ ['RR', 'na1', 'fa1', 'nt1', 'ft1', 'nt2', 'ft2'] ].values
scaled_endpoints = StandardScaler().fit_transform(endpoints)
embedding = reducer.fit_transform(scaled_endpoints)


plt.scatter(
    embedding[:, 0],
    embedding[:, 1],
    c=[sns.color_palette()[x] for x in data.Identifier.map({0:0, 1:1})])

plt.gca().set_aspect('equal', 'datalim')

plt.title('UMAP Projection by Identifier', fontsize=24)

blue = patches.Patch(color='steelblue', label='0')
orange = patches.Patch(color='orange', label='1')
plt.legend(handles=[blue, orange])
########################################################################################################################


################################### SECTION 2 - Pairplot ###############################################################
# sns.pairplot(data, hue='Identifier')
# plt.subplots_adjust(.05, .05, .95, .95)
# plt.suptitle('Endpoint Analysis by Identifier')
########################################################################################################################


plt.show()

This is the message I get when it runs:

Traceback (most recent call last):

  File "C:\Users\Elizabeth\Desktop\UMAPs.py", line 31, in <module>
    embedding = reducer.fit_transform(scaled_endpoints)

  File "C:\ProgramData\Anaconda3\lib\site-packages\umap\umap_.py", line 2012, in fit_transform
    self.fit(X, y)

  File "C:\ProgramData\Anaconda3\lib\site-packages\umap\umap_.py", line 1833, in fit
    self._search_graph.transpose()

  File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\lil.py", line 437, in transpose
    return self.tocsr(copy=copy).transpose(axes=axes, copy=False).tolil(copy=False)

  File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\lil.py", line 462, in tocsr
    _csparsetools.lil_get_lengths(self.rows, indptr[1:])

  File "_csparsetools.pyx", line 109, in scipy.sparse._csparsetools.lil_get_lengths

  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper

  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__

TypeError: a bytes-like object is required, not 'list'```

I have no idea how to fix the error or what to look up since looking up the error gives me nothing.


2
What do you understand from that error message? Have you done any debugging?AMC

2 Answers

3
votes

I stumbled upon the same problem, the weird thing being that UMAP just stopped working on the data (same dataframe) overnight. Anyway, it seems to be quite recurrent with UMAP-learn [1,2], and I fixed it by installing pynndescent :

pip install pynndescent

or

conda install -c conda-forge pynndescent

I hope it helps :)

[1] https://github.com/lmcinnes/umap/issues/401
[2] https://github.com/lmcinnes/umap/issues/452

-2
votes

I'm not familiar with the library you're using, so this won't be a proper answer to this specific question, but a general tip on how to deal with errors like this.

The error occurs on this line:

embedding = reducer.fit_transform(scaled_endpoints)

and complains that the wrong sort of object was provided to the function (a list rather than bytes). So before that line, put this:

print(scaled_endpoints)
print(type(scaled_endpoints))

to understand exactly what you are about to provide the reducer.fit_transform() function. Then go to the documentation for that function. What sort of input is it expecting, and how should it be created? How can you get scaled_endpoints to match that expectation?