42
votes

I have a dataframe in python pandas with several columns taken from a CSV file.

For instance, data =:

Day P1S1 P1S2 P1S3 P2S1 P2S2 P2S3
1   1    2    2    3    1    2
2   2    2    3    5    4    2

And what I need is to get the sum of all columns which name starts with P1... something like P1* with a wildcard.

Something like the following which gives an error:

P1Sum = data["P1*"]

Is there any why to do this with pandas?

3
Is there any reason that people downvote this question?catchmeifyoutry
I'm also not getting it. I mean, it can be a really basic thing, but I searched the documentation of Pandas and didn't find any answers.jbssm
Thank you, but that was not trivial. The documentation doesn't say anything on how to use the function, just says there is a function. Anyway I found the answer and I will post it below.jbssm

3 Answers

63
votes

I found the answer.

Using the data, dataframe from the question:

from pandas import *

P1Channels = data.filter(regex="P1")
P1Sum = P1Channels.sum(axis=1)
5
votes

List comprehensions on columns allow more filters in the if condition:

In [1]: df = pd.DataFrame(np.arange(15).reshape(5, 3), columns=['P1S1', 'P1S2', 'P2S1'])

In [2]: df
Out[2]: 
   P1S1  P1S2  P2S1
0     0     1     2
1     3     4     5
2     6     7     8
3     9    10    11
4    12    13    14

In [3]: df.loc[:, [x for x in df.columns if x.startswith('P1')]].sum(axis=1)
Out[3]: 
0     1
1     7
2    13
3    19
4    25
dtype: int64
1
votes

Thanks for the tip jbssm, for anyone else looking for a sum total, I ended up adding .sum() at the end, so:

P1Sum= P1Channels.sum(axis=1).sum()