7
votes

I have this dataset:

           Game1    Game2   Game3   Game4     Game5

Player1       2        6        5       2        2

Player2       6        4        1       8        4

Player3       8        3        2       1        5

Player4       4        9        4       7        9

I want to calcultate the sum of the 5 games for every player.

This is my code :

import csv
f=open('Games','rb')
f=csv.reader(f,delimiter=';')
lst=list(f)
lst
import numpy as np
myarray = np.asarray(lst)
x=myarray[1,1:] #First player
y=np.sum(x)

I had the error "cannot perform reduce with flexible type". Im really very new to python and I need your help.

Thank you

4
You should show the actual csv file, that one that contains the ; delimiter. Otherwise you leave us guessing as to how the dataset was written. x is probably an array of strings, since nothing in your code converts strings to numbers.hpaulj

4 Answers

3
votes

You can still use a structured array as long as you familiarize yourself with the dtypes. Since your data set is extremely small, the following may serve as an example of using numpy in conjunction with list comprehensions when your dtype is uniform but named

dt = [('Game1', '<i4'), ('Game2', '<i4'), ('Game3', '<i4'),
      ('Game4', '<i4'), ('Game5', '<i4')]
a = np.array([(2, 6, 5, 2, 2),
              (6, 4, 1, 8, 4),
              (8, 3, 2, 1, 5),
              (4, 9, 4, 7, 9)], dtype= dt)

nms = a.dtype.names
by_col = [(i, a[i].sum()) for i in nms if a[i].dtype.kind in ('i', 'f')]
by_col
[('Game1', 20), ('Game2', 22), ('Game3', 12), ('Game4', 18), ('Game5', 20)]

by_row = [("player {}".format(i), sum(a[i])) for i in range(a.shape[0])]
by_row
[('player 0', 17), ('player 1', 23), ('player 2', 19), ('player 3', 33)]

In this example, it would be a real pain to get each sum individually for each column name. That is where the ... a[i] for i in nms bit is useful since the list of names was retrieved by nms = a.dtype.names. Since you are doing a 'sum' then you want to restrict the summation to only integer and float types, hence the a[i].dtype.kind portion.

Summing by row is just as easy but you will notice that I didn't use this syntax but a slightly different one to avoid the error message

a[0].sum()  # massive failure
....snip out huge error stuff...
TypeError: cannot perform reduce with flexible type
# whereas, this works....
sum(a[0])   # use list/tuple summation

Perhaps 'flexible' data types don't live up to their name. So you can still work with structured and recarrays if that is the way that your data comes in. You can become adept at simply reformatting your data by slicing and altering dtypes to suit your purpose. For example, since your data type are all the same and you don't have a monstrous dataset, then you can use many methods to convert to a simple structured array.

b = np.array([list(a[i]) for i in range(a.shape[0])])
b
array([[2, 6, 5, 2, 2],
       [6, 4, 1, 8, 4],
       [8, 3, 2, 1, 5],
       [4, 9, 4, 7, 9]])

b.sum(axis=0)
array([20, 22, 12, 18, 20])

b.sum(axis=1)
array([17, 23, 19, 33])

So you have many options when dealing with structured arrays and depending on whether you need to work in pure python, numpy, pandas or a hybrid, then you should familiarize yourself with all the options.

ADDENDUM

As a shortcut, I failed to mention taking 'views' of arrays that are structured in nature, but have the same dtype. In the above case, a simple way to produce the requirements for simple array calculations by row or column are as follows... a copy of the array was made, but not necessary

b = a.view(np.int32).reshape(len(a), -1)
b
array([[2, 6, 5, 2, 2],
       [6, 4, 1, 8, 4],
       [8, 3, 2, 1, 5],
       [4, 9, 4, 7, 9]])
b.dtype
dtype('int32')

b.sum(axis=0)
array([20, 22, 12, 18, 20])

b.sum(axis=1)
array([17, 23, 19, 33])
2
votes

The complication with using numpy is that one has two sources of error (and documentation to read), namely python itself as well as numpy.

I believe your problem here is that you are working with a so-called structured (numpy) array.

Consider the following example:

>>> import numpy as np
>>> a = np.array([(1,2), (4,5)],  dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a.sum()
TypeError: cannot perform reduce with flexible type

Now, I first select the data I want to use:

>>> import numpy as np
>>> a = np.array([(1,2), (4,5)],  dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a["Game 1"].sum()
5.0

Which is what I wanted.

Maybe you would consider using pandas (python library), or change language to R.


Personal opinions

Even though "numpy" certainly is a mighty library I still avoid using it for data-science and other "activities" where the program is designed around "flexible" data-types. Personally I use numpy when I need something to be fast and maintainable (it is easy to write "code for the future"), but I do not have the time to write a C program.

As far as Pandas goes it is convenient for us "Python hackers" because it is "R data structures implemented in Python", whereas "R" is (obviously) an entirely new language. I personally use R as I consider Pandas to be under rapid development, which makes it difficult to write "code with the future in mind".


As suggested in a comment (@jorijnsmit I believe) there is no need to introduce large dependencies, such as pandas, for "simple" cases. The minimalistic example below, which is compatible to both Python 2 and 3, uses "typical" Python tricks to massage the data it the question.

import csv

## Data-file
data = \
'''
       , Game1, Game2,   Game3,   Game4,   Game5
Player1,  2,    6,       5,       2,     2
Player2,  6,      4 ,      1,       8,      4
Player3,  8,     3 ,      2,    1,     5
Player4,  4,  9 ,   4,     7,    9
'''

# Write data to file
with open('data.csv', 'w') as FILE:
    FILE.write(data)

print("Raw data:")
print(data)

# 1) Read the data-file (and strip away spaces), the result is data by column:
with open('data.csv','rb') as FILE:
  raw = [ [ item.strip() for item in line] \
                      for line in list(csv.reader(FILE,delimiter=',')) if line]

print("Data after Read:")
print(raw)

# 2) Convert numerical data to integers ("float" would also work)
for (i, line) in enumerate(raw[1:], 1):
    for (j, item) in enumerate(line[1:], 1):
        raw[i][j] = int(item)

print("Data after conversion:")
print(raw)

# 3) Use the data...
print("Use the data")
for i in range(1, len(raw)):
  print("Sum for Player %d: %d" %(i, sum(raw[i][1:])) )

for i in range(1, len(raw)):
  print("Total points in Game %d: %d" %(i, sum(list(zip(*raw))[i][1:])) )

The output would be:

Raw data:

       , Game1, Game2,   Game3,   Game4,   Game5
Player1,  2,    6,       5,       2,     2
Player2,  6,      4 ,      1,       8,      4
Player3,  8,     3 ,      2,    1,     5
Player4,  4,  9 ,   4,     7,    9

Data after Read:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', '2', '6', '5', '2', '2'], ['Player2', '6', '4', '1', '8', '4'], ['Player3', '8', '3', '2', '1', '5'], ['Player4', '4', '9', '4', '7', '9']]
Data after conversion:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', 2, 6, 5, 2, 2], ['Player2', 6, 4, 1, 8, 4], ['Player3', 8, 3, 2, 1, 5], ['Player4', 4, 9, 4, 7, 9]]
Use the data
Sum for Player 1: 17
Sum for Player 2: 23
Sum for Player 3: 19
Sum for Player 4: 33
Total points in Game 1: 20
Total points in Game 2: 22
Total points in Game 3: 12
Total points in Game 4: 18
1
votes

Consider using Pandas module:

import pandas as pd

df = pd.read_csv('/path/to.file.csv', sep=';')

Resulting DataFrame:

In [196]: df
Out[196]:
         Game1  Game2  Game3  Game4  Game5
Player1      2      6      5      2      2
Player2      6      4      1      8      4
Player3      8      3      2      1      5
Player4      4      9      4      7      9

Sum:

In [197]: df.sum(axis=1)
Out[197]:
Player1    17
Player2    23
Player3    19
Player4    33
dtype: int64

In [198]: df.sum(1).values
Out[198]: array([17, 23, 19, 33], dtype=int64)
1
votes

You don't need numpy at all, just do this:

import csv
from collections import OrderedDict

with open('games') as f:
    reader = csv.reader(f, delimiter=';')
    data = list(reader)

sums = OrderedDict()
for row in data[1:]:
    player, games = row[0], row[1:]
    sums[player] = sum(map(int, games))