0
votes

What I have got is a CSV file with following structure:

column1 column2 column3 column4(day) column5(month&year) column6(time)

column1 column2 column3 column4(day) column5(month&year) column6(time)

column1 column2 column3 column4(day) column5(month&year) column6(time)

...

The columns of the file do not have names. Now I want to merge column4 and column5 and generate a new version of the CSV file so that I have got the complete date in one cell.

What I have tried is following Python code:


def correctDatetime():
    with open("inputfile.csv", "r") as source, open("outputfile.csv", "w") as result:
        df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=0)

        for row in source:
            df['Datetime'] = df[df.columns[3:]].apply(lambda x: ' '.join(x.dropna().astype(str)), axis=1)
        result.write(df)

Unfortunately this only generates an empty CSV file. How could I best approach this issue? Any advice would be helpful. Thanks in advance!

2

2 Answers

0
votes
import pandas as pd
#Setup DF
data = [
(1234,    1,     'x', 'x', 34, 'May 2019'),
(1234,    2,     'x', 'x', 34, 'June 2019'),
(1235,    1,     'y', 'y', 34, 'July 2019')]
df = pd.DataFrame(data)

The following code will combine your data based on column position and then drop unnecessary columns and save to some_csv.csv:

df['Datetime']=df.apply(lambda x:'%s %s' % (str(x[df.columns[4]]),str(x[df.columns[5]])),axis=1)
df.drop(columns=[df.columns[4],df.columns[5]], inplace=True)
df.to_csv('some_csv.csv')
0
votes

Add header=None because no header in csv file to read_csv, then extract column 3 and 4 by DataFrame.pop and join together, last write to file by DataFrame.to_csv:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    df['Datetime'] = df.pop(3) + ' ' + df.pop(4) + ' ' + df.pop(5)
    df.to_csv("outputfile.csv", index=False, header=False)

If need convert to datetimes and format of month with years is MMYYYY and time column is HH:MM:SS add to_datetime:

def correctDatetime():
    df = pd.read_csv('C:/ProgrammingProjects/LicMonTest/inputfile.csv', header=None)
    s = df.pop(3) + '-' + df.pop(4) + ' ' + df.pop(4)
    df['Datetime'] = pd.to_datetime(s, format='%d-%m%Y %H:%M:%S')
    df.to_csv("outputfile.csv", index=False, header=False)