How can I compute average monthly value from large daily JSON dataset in pandas dataframe?

Question

I'm relatively new to working with pandas dataframes to read in data and I'm having some trouble working with my dataset. I've been reading many other StackOverflow posts on a similar problem but I've been having trouble applying those solutions to my case possibly because of the structure of my JSON data. My JSON data as arranged into my dataframe df = pd.DataFrame.from_records(data) generally looks like this

             dateTime                          value
0   01/16/20 04:32:42   {'bpm': 70, 'confidence': 0}
1   01/16/20 04:32:57   {'bpm': 70, 'confidence': 0}
2   01/16/20 04:33:12   {'bpm': 70, 'confidence': 1}

and so on for many daily values.

My goal is to read all of this raw daily data and compute a monthly average of "bpm" and plot to a matplot graph. My problem is I'm having trouble using pandas datetime or mean() operations because I don't think pandas accepts my dateTime format as actually in datetime and when I try to use mean() it gives me an pandas.core.base.DataError: No numeric types to aggregate error.

How can I use built in pandas tools to allow me to compute a monthly average by grouping my daily values together based on the month?

for file in os.listdir(data_dir):  # look at every file in the folder
    if file.startswith("heart_rate") and file.endswith(".json"):  # only want heart_rate-date.json files
        with open(os.path.join(data_dir, file)) as f:  # open each file in data_dir
            data = json.load(f)
            df = pd.DataFrame.from_records(data)
            print(df)
            #df.dateTime = pd.to_datetime(df.dateTime)
            #df['Month'] = df['dateTime'].dt.month
            for i, j in enumerate(data):
                if data[i]['value']['confidence'] > 0:
                    daily_avg_bpm += data[i]['value']['bpm']
                    daily_date = data[i]['dateTime'].split()[0]
                    my_date = datetime.datetime.strptime(daily_date, "%m/%d/%y").date()
                    days.append(my_date)
            months.append(daily_date[:2])
            daily_avg_bpm /= len(data)
            dates.append(daily_date)
            avg_bpms.append(round(daily_avg_bpm))
        f.close()
plt.xlabel('Month')
plt.ylabel('Heart Rate')
plt.title("Fitbit Heart Rate")
for i, j in enumerate(dates):
    plt.plot(dates[i], avg_bpms[i])
plt.show()

Hi Prayson, Unfortunately not yet. I was trying to apply your suggestion but I was having errors with the apply() and I've been getting list indices must. Now I'm getting DataError raise DataError("No numeric types to aggregate") pandas.core.base.DataError: No numeric types to aggregate. — bpiekars
It means you do not have numerical values to resample. What do you get by df.dtypes? — Prayson W. Daniel
My dataframe values look like the example shown in my question with datetime and value. The numerical values I want are the ['values']['bpm'] values but df.dtypes gives me an error TypeError: 'Series' object is not callable — bpiekars
As you see, you have Series not DataFrame. So the example data are not the same. Can you try df['value'].apply(pd.Series) ? Does that expand value? — Prayson W. Daniel

Prayson W. Daniel Prayson W. Daniel · Accepted Answer · 2020-12-21T06:59:03

The easiest way it to transform your values to columns and cast dateTime to actual datetime. With that you can use resample to aggregate on the frequency desire:

import pandas as pd

data = pd.DataFrame({'dateTime':[ '01/16/20 04:32:42', '01/16/20 04:32:57', '02/16/20 04:33:12', '03/16/20 04:33:12'],
'value': [{'bpm': 70, 'confidence': 0}, {'bpm': 75, 'confidence': 0},  
          {'bpm': 73, 'confidence': 1}, {'bpm': 78, 'confidence': 1}]})

# expland 
df = data['value'].apply(pd.Series)

# to datetime and set index 
df['dateTime'] = pd.to_datetime(data['dateTime'])
df.set_index('dateTime', inplace=True)

# data resample to Monthy with mean 
example = df.resample('M', kind='period').mean()

# plot
example['bpm'].plot(title="Fitbit Heart Rate", xlabel="Month", ylabel="Heart Rate");

Note:

For a graph to show, make sure you have data with N > 1 of the selected frequency. So if the frequency is "M" for Month, make sure there are at least 2 months in the dataset.

Pandas resample

How can I compute average monthly value from large daily JSON dataset in pandas dataframe?

2 Answers

Note: