Let's consider the dataset of House prices from this example.
I have the entire dataset stored in the housing
variable:
housing.shape
(20640, 10)
I also have done a OneHotEncoder encoding of one dimensions and get housing_cat_1hot
, so
housing_cat_1hot.toarray().shape
(20640, 5)
My target is to join the two variables and store everything in just one dataset.
I have tried the Join with index tutorial but the problem is that the second matrix haven't any index.
How can I do a JOIN between housing
and housing_cat_1hot
?
>>> left=housing
>>> right=housing_cat_1hot.toarray()
>>> result = left.join(right)
Traceback (most recent call last): File "", line 1, in result = left.join(right) File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pandas/core/frame.py", line 5293, in join rsuffix=rsuffix, sort=sort) File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pandas/core/frame.py", line 5323, in _join_compat can_concat = all(df.index.is_unique for df in frames) File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pandas/core/frame.py", line 5323, in can_concat = all(df.index.is_unique for df in frames) AttributeError: 'numpy.ndarray' object has no attribute 'index'
left.join(housing_cat_1hot)
is all you need – Bharath