filtring data of a csv file with pandas

Question

I have a CSV file which has stock market data by date and symbol. It has various columns like date, Max price, Min price and so on. I must resample these data with 10 days frequency from last four months of 2019 until the first three months of 2020, and then filter the Min.mean() above than 20000 and Max.mean() less than 35000. i wrote this code:

import numpy as np
import pandas as pd
from datetime import datetime
stock_df = pd.read_csv('stock_data.csv')
stock_df.drop(columns = ['Unnamed: 0'],inplace=True)
column_names = {
"تاریخ":"str_date",
"اولین قیمت":"Open",
"نماد":"Symbol",
"نام":"Name",
"قیمت پایانی":"Close",
"قیمت آخرین معامله":"Last Trade",
"تعداد معاملات":"Quantity",
"حجم":"Volume",
"ارزش":"Value",
"کمترین قیمت":"Min",
"بیشترین قیمت":"Max",
"قیمت دیروز":"Yesterday Price",
"تغییر":"Change"}
stock_df.rename(columns=column_names,inplace=True)
stock_df["date"]  =  pd.to_datetime(stock_df["str_date"])

The original attempt was like this:

stock_df.set_index('date',inplace=True)
result = stock_df.loc["2019-09":"2020-03"].resample('10D').Max.mean()<=35000 & stock_df.loc["2019-09":"2020-03"].resample('10D').Min.mean()>=20000

stock_df is the DataFrame that holds CSV file data. The final answer must put in the dataframe result. but this code gets this error:

TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

Rick M Rick M · Accepted Answer · 2021-02-11T03:55:50

The code below sets and sorts your index on date, then selects your date range.

Because you have multiple symbols the groupby needs to be explicit about the resample on date. It then takes the mean of the two columns you mentioned.

Finally, it creates the boolean conditions you asked for, cuts the tmp dataframe to select only those rows, resets the index to make it "flat" again, and puts a copy of the result into result.

tmp = stock_df.set_index(['date', 'Symbol']).sort_values(['date'])
tmp = tmp.loc[pd.IndexSlice['2019-09-01':'2020-03-31', :], :]

tmp = tmp.groupby([pd.Grouper(level='date', freq='10D'), pd.Grouper(level='Symbol')])[['Max','Min']].mean()

cond1 = (tmp.Max <= 35000)
cond2 = (tmp.Min >= 20000)
result = tmp[cond1 & cond2].reset_index().copy()

You should verify that the results are what you expect, perhaps on a smaller dataset where you know the result ahead of time. Hopefully that gives you enough to work with.

filtring data of a csv file with pandas

1 Answers