I have two columns (serverTs, FTs) in DataFrame which are timestamps in the format of Unix Time. In my code I need to subtract one from another. When i did so I received an error saying I can't subtract strings. So I added types for serverTs and FTs as integers.
file = r'S:\Работа с клиентами\Клиенты\BigTV Rating\fts_check.csv'
col_names = ["Day", "vcId", "FTs", "serverTs", "locHost", "tnsTmsec", "Hits", "Uniqs"]
df_empty = pd.DataFrame()
with open(file) as fl:
chunk_iter = pd.read_csv(fl, sep='\t', names=col_names, dtype={'serverTs': np.int32, 'FTs': np.int32}, chunksize = 100000)
for chunk in chunk_iter:
chunk['diff'] = np.array(chunk['serverTs'])-np.array(chunk['FTs'])
chunk = chunk[chunk['diff'] > 180]
df_empty = pd.concat([df_empty,chunk])
But the program gives me an error:
TypeError Traceback (most recent call last) pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
TypeError: Cannot cast array from dtype('O') to dtype('int32') according to the rule 'safe'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last) in () 6 #dtype={'serverTs': np.int32, 'FTs': np.int32}, 7 #chunk_iter = chunk_iter.astype({'serverTs': np.int32, 'FTs': np.int32}) ----> 8 for chunk in chunk_iter: 9 #print(chunk[chunk['FTs'] == 'NaN']) 10 #chunk[['serverTs','FTs']] = chunk[['serverTs','FTs']].astype('int32')
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in next(self) 1040 def next(self): 1041 try: -> 1042 return self.get_chunk() 1043 except StopIteration: 1044 self.close()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in get_chunk(self, size) 1104 raise StopIteration
1105 size = min(size, self.nrows - self._currow) -> 1106 return self.read(nrows=size) 1107 1108C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows) 1067 raise ValueError('skipfooter not supported for iteration') 1068 -> 1069 ret = self._engine.read(nrows) 1070 1071 if self.options.get('as_recarray'):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows) 1837 def read(self, nrows=None): 1838
try: -> 1839 data = self._reader.read(nrows) 1840 except StopIteration: 1841 if self._first_chunk:pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
ValueError: invalid literal for int() with base 10: 'FTs'
I'm taking data from Hadoop with SQL queries, so I checked for any symbol with letters, but there are only numbers. Moreover if FTs has any characters which are not numbers it cannot appear in the database. What could be the problem?
names
and letread_csv
read the column names. It looks like you are trying to read the string'FTs'
from the file as a number. – jdehesa