0
votes

I have the following dataset:

dataset.head(7)
Transaction_date     Product   Product Code  Description    
2019-01-01           A         123           A123
2019-01-02           B         267           B267
2019-01-09           B         267           B267
2019-02-11           C         139           C139
2019-02-11           A         125           C125 
2019-02-12           C         139           C139
2019-02-12           A         123           A123

The dataset stores transaction information, for which a transaction date is available. In other words, not for all days, data is available. Ultimately, I want to create a time series plot, showing me the number of transactions per day.

So far, I have done a simple countplot:

ax = sns.countplot(x=dataset["Transaction_date"],data=dataset)

This plot shows me the dates, where a transaction happened. But I would prefer to see also the dates, where no transaction has happened in a plot, preferably shown as 0.

I have tried the following, but retrieve an error message:

groupbydate = dataset.groupby("Transaction_date")
ax = sns.tsplot(x="Transaction_date",y="Product",data=groubydate.fillna(0))

But I get the error cannot label index with a null key Due to restrictions, I can only use seaborn 0.8.1

2

2 Answers

0
votes

I believe reindex should work for you:

# First convert the index to datetime
dataset.index = pd.DatetimeIndex(dataset.index)

# Then reindex! You can also select the min and max of the index for the limits
dataset= dataset.reindex(pd.date_range("2019-01-01", "2019-02-12"), fill_value="NaN")
0
votes

You can drop the rows containing NaN values using pandas.DataFrame.dropna, and then plot the chart. For example:

dataset.dropna(thresh=2)

will drop all rows where there are at least two NaN values.

You may also want to fill the NaN values using pandas.DataFrame.fillna