Get average of columns in pandas based on particular rows

Question

I have a data that stores % change in value for stocks over one day, month, three months and year.

ID     daychange   monthchange   trimonthchange  yearchange
UNITY   0.001666     0.398450        0.411581    0.689139
SSOM  -0.033359     0.040816        1.174840    3.047619
PNSC  -0.004953    -0.053006        0.158677    0.224932
PICT  -0.002027    -0.069696        0.041143    0.310791
PIBTL  -0.014848     0.129362        0.459846    0.287100

I want to get:

averages for each of the 4 time periods based on the ID name. For instance "UNITY" "SSOM" "PNSC" are to be grouped and their average be calculated for each time period (i.e. avg daychange for the group, avg monthchange for the group and so on).
that average data to be put in a separate csv file with a new single row (instead of "UNITY" "SSOM" "PNSC") and the average of each time periods for the group.

I have tried df.mean(axis = 1) but that doesn't do the job.

Any help will be dearly appreciated! THANKS

n1colas.m n1colas.m · Accepted Answer · 2020-09-26T23:51:53

Use loc to access a group of rows by label (ID column) and then calculate the mean for each time period column using axis=0. Create a Series with the group's name and append the previous results to it (this way the group name will be the first column in the dataframe). Place the Series in a list, then when the iteration over the groups is done, convert to a dataframe. To put the resulst in a separate csv file use to_csv.

import pandas as pd

GROUPS = [
    ["UNITY", "SSOM", "PNSC"],
    ["SSOM", "PICT", "PIBTL"],
    ["SSOM", "PNSC", "PIBTL"],
]

df = pd.read_csv("sample.csv", sep="\s+")
df = df.set_index("ID")

data = []
for g in GROUPS:
    group_mean = df.loc[g].mean(axis=0)
    serie = pd.Series({"groupName":"-".join(g)}).append(group_mean)
    data.append(serie)

data = pd.DataFrame(data)
print(data)
data.to_csv("output.csv", index=False)

Output from data

         groupName  daychange  monthchange  trimonthchange  yearchange
0  UNITY-SSOM-PNSC  -0.012215     0.128753        0.581699    1.320563
1  SSOM-PICT-PIBTL  -0.016745     0.033494        0.558610    1.215170
2  SSOM-PNSC-PIBTL  -0.017720     0.039057        0.597788    1.186550

Get average of columns in pandas based on particular rows

2 Answers