I'm adding a python script as part of a Tableau calculated field and it appears Tableau is passing one row of data at a time to the calculated field instead of the whole lists (for _arg1
and _arg2
). I already have the setup TabPy and made the connection with the local host, etc. I can run "hello world!" type scripts without errors. I'm trying to follow some simple DBSCAN tutorial(s) online I've found on my own dataset. I have a 2-D scatter plot in Tableau and I'm trying to cluster the data points using the 2 axes in the plot. Here's the code for the calculated field I'm using now.
SCRIPT_STR(
"from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd
import string
def int_to_string(val):
if val == -2:
return 'NaN'
elif val == -1:
return 'Outlier'
else:
return string.ascii_lowercase[val]
eps=1
min_samples=10
ids = range(len(_arg1))
X = np.column_stack([_arg1, _arg2])
X = pd.DataFrame(X, index=ids, columns=['x', 'y'])
X.dropna(how='any', inplace=True)
X_scale = StandardScaler().fit_transform(X)
labels = DBSCAN(eps=eps, n_jobs=-1,
min_samples=min_samples).fit_predict(X_scale)
result = pd.Series(index=X.index)
result.loc[X.index] = labels
result.fillna(-2, inplace=True)
result = result.apply(int_to_string)
return list(result)",
avg([Var1]), avg([Var2])
)
It's more complicated than the tutorial because my data set has NaN values and I'm trying to handle those with the pandas code.
The real problem is that the X
DataFrame seems to only be 1 row in size. I know that's not true for the actual data; in Tableau, there are 1000's of data points showing on the scatterplot. I know that it only has 1 row of data because I get the following error from Tableau (I think this error is occuring when the one row of data happens to have a null value in it)...
...and because I added a pickle statement into the script for a little while to export the X
DataFrame to file and when I open that pickled object in Python it shows the DataFrame has a shape of (1, 2); 1 row and 2 columns
Var1
and Var2
aren't aggregated fields, or anything so taking the average should not reduce them to a single value.
Has anyone run into this before? What is wrong with the Tableau Script code that might be causing this issue? Or am I doing something else wrong?