0
votes

I have three columns in my dataframe: Tweet Posted Time (UTC), Tweet Content, and Tweet Location. The "Tweet Posted Time (UTC)" column has date object in the format: 31 Mar 2020 10:49:01

My objective is to reformat the dataframe in such a way that the 'Tweet Posted Time (UTC)' column displays only the day, month and the year alone (such as 31-03-2020), to be able to plot a time-series graph, but my attempts result in the error below.

ValueError: time data '0 31 Mar 2020 10:49:01\n1 31 Mar 2020 05:48:43\n2 30 Mar 2020 05:38:50\n3 29 Mar 2020 21:19:23\n4 29 Mar 2020 20:28:22\n ... \n2488 02 Jan 2018 13:36:07\n2489 02 Jan 2018 10:33:21\n2490 01 Jan 2018 12:23:47\n2491 01 Jan 2018 06:03:51\n2492 01 Jan 2018 02:09:15\nName: Tweet Posted Time (UTC), Length: 2451, dtype: object' does not match format '%d %b %Y %H:%M:%S'

My code is below, can you tell me what I am doing wrong, please?

from datetime import datetime
import pandas as pd
import re #regular expression
from textblob import TextBlob
import string
import preprocessor as p


pd.set_option("expand_frame_repr", False)

df1 = pd.read_csv("C:/tweet_data.csv")

dataType = df1.dtypes
print(dataType)

# convert datetime object to string
old_formatDate = str(df1['Tweet Posted Time (UTC)'])

# extract day, month, and year and convert back to datetime object
date_TimeObject = datetime.strptime(old_formatDate, '%d %b %Y %H:%M:%S')
new_formatDate = date_TimeObject.strftime('%d-%m-%Y')
print(new_formatDate)
1

1 Answers

0
votes

I researched and solved the problem by changing the data frame to panda series and then to datetime format. Then, applied dt.strftime.

df.columns = ['Tweet_Posted_Time', 'Tweet_Content', 'Tweet_Location']
print(df)

# Convert the date and time column (Tweet_Posted_Time) from panda data frame to Panda Series
df1 = pd.Series(df['Tweet_Posted_Time'])
print(df1)

# Convert the Panda Series to datetime format
df1 = pd.to_datetime(df1)
print(df1)

# convert the date column to new date format
df1 = df1.dt.strftime('%d-%m-%Y')
print(df1)

# Replace the Column "Tweet_Posted_Time" in the original data frame with the new data frame containing new date format
df.assign(Tweet_Posted_Time=df1)````