I am trying to explore this dataset with pandas 0.20.3 in Python 3.6.2.
%pylab inline
import pandas as pd
df = pd.read_csv('OnlineNewsPopularity.csv')
df['n_tokens_content'][:9]
last line produces error
KeyError Traceback (most recent call last) ~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2441 try: -> 2442 return self._engine.get_loc(key) 2443 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()
KeyError: 'n_tokens_content'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) in () ----> 1 df['n_tokens_content'][:9]
~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/frame.py in getitem(self, key) 1962 return self._getitem_multilevel(key) 1963 else: -> 1964 return self._getitem_column(key) 1965 1966 def _getitem_column(self, key):
~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key) 1969 # get column 1970
if self.columns.is_unique: -> 1971 return self._get_item_cache(key) 1972 1973 # duplicate columns & possible reduce dimensionality~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item) 1643 res = cache.get(item)
1644 if res is None: -> 1645 values = self._data.get(item) 1646 res = self._box_item_values(item, values) 1647
cache[item] = res~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath) 3588 3589 if not isnull(item): -> 3590 loc = self.items.get_loc(item) 3591 else: 3592 indexer = np.arange(len(self.items))[isnull(self.items)]
~/anaconda3/envs/tf11/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2442
return self._engine.get_loc(key) 2443 except KeyError: -> 2444 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2445 2446
indexer = self.get_indexer([key], method=method, tolerance=tolerance)pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5280)()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5126)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20523)()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas/_libs/hashtable.c:20477)()
KeyError: 'n_tokens_content'
I think this is caused by some rows in the csv file, as this piece of code work well for other csv.
if yes, how to locate the bad rows efficiently?
n_tokens_content
in the dataframe you created. You'll have to examine the dataframe (e.g., rundf.columns
ordf.head()
) to see what your column names are. – AlexK