1
votes

I have multiple csv files in a folder (file1, file2, file3, file4, file5,....)

I only know how to import one file, run the command and output the converted file as shown in below code. I would like to run the command in multiple csv files at once. Can someone please help?

convert.py:

import pandas as pd
import numpy as np

#read file
df = pd.read_csv("file1.csv")

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('file1_converted.csv', index = False)

I started with a code as shown below but it gave only one output(*.csv) from one random csv file. I would like separate output for each file.

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('*.csv', index = False)
4
Look at your indentation. What is happening in the for loop, and what should be happening to produce one output file per input file?i alarmed alien
and 4 people posting the exact same solution is just crazyxyzjayne

4 Answers

1
votes

indent the code that does the dataframe transformation and include it in the for loop like this:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('updated_{}'.format(file), index = False)
0
votes

You just need to indent the file writing code so it is performed inside the loop, otherwise it will only write the last file:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('*.csv', index = False)
0
votes
total_no_file=10

for i in range(total_no_file):
    file_name="file"+str(i+1)
    df = pd.read_csv(file_name)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    file_name="file"+str(i+1)+"_converted"
    df.to_csv('file1_converted.csv', index = False)
0
votes

So there were two problems with your code. First the indentation was all messed up so the for loop was only reading the different csv files into the same variable. Second you should give a different name to the converted csv files you are writing to disk. Therefore, the following should work for you:

import os
import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    file_name = os.path.splitext(os.path.basename(file))  # Get the file name without extension
    df = pd.read_csv(file)

    #make conversion 
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time 
    taken'].dt.minute / 60

    #output file
    df.to_csv('{}_conv.csv'.format(file_name, index = False)