0
votes

I have two tables of data. Each table has the same dimension 245x10. The original file can be found here. I need to compute t-test for these two tables; however, I get the error when I apply numpy function.

import scipy.stats as st
import numpy as np
import pandas as pd

df = pd.read_csv('GC Cerbellum final.txt', sep='\t')
df1 = df.ix[:, 1:12]
df2 = df.ix[:, 12:]
st.ttest_ind(df1, df2)

/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims) 2936 2937 return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof, -> 2938 keepdims=keepdims)

/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims) 93 if isinstance(arrmean, mu.ndarray): 94 arrmean = um.true_divide( ---> 95 arrmean, rcount, out=arrmean, casting='unsafe', subok=False) 96 else: 97 arrmean = arrmean.dtype.type(arrmean / rcount)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

I checked, and it looks that all data is integers and I'm not sure why it fails on strings. It can also be the case that because missing values are filled somehow with strings, it fails.

So my question how can I perform t-test for two tables in python with missing values?

1

1 Answers

1
votes

Your file has NAs in it, but pandas does not know how to interpret them. You can read it like

df = pd.read_csv('GC Cerbellum final.txt', sep='\t', na_values=[' NA'])

and pandas will read them as float columns rather than string columns, with NaNs appropriately.

(By the way, also, your slices don't seem to be right.)