0
votes

I would like to read just csv last 7 days createds csv files from a directory into pandas and concatenate them into one big DataFrame. I have not been able to figure it out though. Here is what I have so far:

Edit: I'm trying to filter by the creation date of csv file, not by any column in csv.

from datetime import datetime, timedelta
import pandas as pd
import glob

fileday = datetime.now() - timedelta(7)
fileday = datetime.strftime(fileday, '%Y%m%d')

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
2
is there a date column in the csv, or is it the date included in the filename? in any case please update your example to include it.Umar.H
I'm trying to filter by the creation date of the CSV file, not a column in csv.João Vitor Berruezo

2 Answers

0
votes

Since you're using pandas, let's use a combination of pathlib and pandas.

from pathlib import Path
import pandas as pd 

p = Path(r'C:\DRO\DCL_rawdata_files')

all_files = p.glob('*.csv')

df = pd.DataFrame({'files' : all_files})

df['date'] = pd.to_datetime(df['files'].apply(lambda x : x.stat().st_mtime),unit='s')

# filter your files.
trg_files = df[df['date'] >= pd.Timestamp('now') - pd.DateOffset(days=7)]['files'].tolist()


dfs = [pd.read_csv(f) for f in trg_files]
0
votes

You could do something like this.

df = pd.DataFrame()

for filename in all_files:
    df = df.append(pd.read_csv(filename))