2
votes

My Data Frame

The below data frame consist of "Year", "Month" and "Data" as column:

np.random.seed(0)

df = pd.DataFrame(dict(
Year = [2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003],
Month = [1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12],
Data = np.random.randint(21,100,size=36)))

df

I want to a pythonic way to convert it to time series data such that I will have "Data" and "Data" in place as time series data instead of data frame.

What I Tried

I have tried:

import pandas as pd
timeseries = data.assign(Date=pd.to_datetime(data[['Year', 'Month']].assign(day=1)))
columns = ['Year','Month']

df.drop(columns, inplace=True, axis=1) # I don't need day but year and month timeseries

but the new data only add a column called "Date" to the data frame.

What I Want

I want a time series data which will consist of "Date" (2001-1 for instance) and "Data" column only such that I can make a time plot, do time series analysis and forecast with the data.

I mean how to index such time series data such that when I plot with this code:

plt.figure(figsize=(5.5, 5.5))
data1['Data'].plot(color='b')
plt.title('Monthly Data')
plt.xlabel('Data')
plt.ylabel('Data')
plt.xticks(rotation=30)

I will have my x-axis graduated as data not as number

1

1 Answers

2
votes

IIUC, your approach is good and let pandas plot handle x-axis.

ax = df.set_index(pd.to_datetime(df[['Year','Month']].assign(day=1)))['Data']\
       .plot(color='b', figsize=(5.5,5.5), title='Monthly Data')
_ = ax.set_xlabel('Data')
_ = ax.set_ylabel('Data')

Output:

enter image description here