Run Python script over multiple files in a folder

Question

I have multiple csv files in a folder (file1, file2, file3, file4, file5,....)

I only know how to import one file, run the command and output the converted file as shown in below code. I would like to run the command in multiple csv files at once. Can someone please help?

convert.py:

import pandas as pd
import numpy as np

#read file
df = pd.read_csv("file1.csv")

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('file1_converted.csv', index = False)

I started with a code as shown below but it gave only one output(*.csv) from one random csv file. I would like separate output for each file.

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('*.csv', index = False)

Look at your indentation. What is happening in the for loop, and what should be happening to produce one output file per input file? — i alarmed alien

jmunsch jmunsch · Accepted Answer · 2018-07-20T20:02:18

indent the code that does the dataframe transformation and include it in the for loop like this:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('updated_{}'.format(file), index = False)

Run Python script over multiple files in a folder

4 Answers