1
votes

I have a data that stores % change in value for stocks over one day, month, three months and year.

ID     daychange   monthchange   trimonthchange  yearchange
UNITY   0.001666     0.398450        0.411581    0.689139
SSOM  -0.033359     0.040816        1.174840    3.047619
PNSC  -0.004953    -0.053006        0.158677    0.224932
PICT  -0.002027    -0.069696        0.041143    0.310791
PIBTL  -0.014848     0.129362        0.459846    0.287100

I want to get:

  1. averages for each of the 4 time periods based on the ID name. For instance "UNITY" "SSOM" "PNSC" are to be grouped and their average be calculated for each time period (i.e. avg daychange for the group, avg monthchange for the group and so on).
  2. that average data to be put in a separate csv file with a new single row (instead of "UNITY" "SSOM" "PNSC") and the average of each time periods for the group.

I have tried df.mean(axis = 1) but that doesn't do the job.

Any help will be dearly appreciated! THANKS

2

2 Answers

1
votes

Use loc to access a group of rows by label (ID column) and then calculate the mean for each time period column using axis=0. Create a Series with the group's name and append the previous results to it (this way the group name will be the first column in the dataframe). Place the Series in a list, then when the iteration over the groups is done, convert to a dataframe. To put the resulst in a separate csv file use to_csv.

import pandas as pd

GROUPS = [
    ["UNITY", "SSOM", "PNSC"],
    ["SSOM", "PICT", "PIBTL"],
    ["SSOM", "PNSC", "PIBTL"],
]

df = pd.read_csv("sample.csv", sep="\s+")
df = df.set_index("ID")

data = []
for g in GROUPS:
    group_mean = df.loc[g].mean(axis=0)
    serie = pd.Series({"groupName":"-".join(g)}).append(group_mean)
    data.append(serie)

data = pd.DataFrame(data)
print(data)
data.to_csv("output.csv", index=False)

Output from data

         groupName  daychange  monthchange  trimonthchange  yearchange
0  UNITY-SSOM-PNSC  -0.012215     0.128753        0.581699    1.320563
1  SSOM-PICT-PIBTL  -0.016745     0.033494        0.558610    1.215170
2  SSOM-PNSC-PIBTL  -0.017720     0.039057        0.597788    1.186550
0
votes

You will probably need to chain both the filter and mean function calls. Since you first want to select a set of rows to keep, you need to filter out the ones that you don't. Then, with the new dataset, you want to execute your column averaging.

df.filter(lambda row: row.ID in ["UNITY", "SSOM", "PNSC"]).mean(axis = 1)

Note, I have not tested the above code, and make no guarantees that it will work as-is