1
votes

I have a Series with 305 entries that with Datatime index. the data looks like this

1992-01-31     1.123077 
1992-02-28    -2.174845 
1992-03-31    -3.884848
1992-04-30     8.682919
1992-05-29     1.312976
1992-06-30     7.851080
1992-07-31    -3.192788
1992-08-31    -7.351976
1992-09-30    -6.782217
1992-10-30   -17.182738
1992-11-30     3.898782
1992-12-31   -26.190414
1993-01-29     2.233359
1993-02-26     6.709006
continues with monthly data till December 2017

I want to reshape the data as a DataFrame that has all Years for Rows and Months for Columns and data to fill in as appropriate

        January February  March     etc >>  December
2017    values  values    values    values  values  
2016    values  values    values    values  values  
2015    values  values    values    values  values  
etc \\// 
1992    values                  

I looked at other posts and tried reshape and asmatrix but given that it is uneven series I keep getting this error.

ValueError: total size of new array must be unchanged.

What I really want to do is if the matrix is odd shaped then insert NaN for the missing values. So if there was no November or December values in 2017 they would be NaN

let me know if anyone can help

3

3 Answers

2
votes

Source DF:

In [159]: df
Out[159]:
                  val
date
1992-01-31   1.123077
1992-02-28  -2.174845
1992-03-31  -3.884848
1992-04-30   8.682919
1992-05-29   1.312976
1992-06-30   7.851080
1992-07-31  -3.192788
1992-08-31  -7.351976
1992-09-30  -6.782217
1992-10-30 -17.182738
1992-11-30   3.898782
1992-12-31 -26.190414
1993-01-29   2.233359
1993-02-26   6.709006

Solution:

import calendar

In [158]: (df.assign(year=df.index.year, mon=df.index.month)
             .pivot(index='year', columns='mon', values='val')
             .rename(columns=dict(zip(range(13), calendar.month_name))))
Out[158]:
mon    January  February     March     April       May     June      July    August  September    October  November   December
year
1992  1.123077 -2.174845 -3.884848  8.682919  1.312976  7.85108 -3.192788 -7.351976  -6.782217 -17.182738  3.898782 -26.190414
1993  2.233359  6.709006       NaN       NaN       NaN      NaN       NaN       NaN        NaN        NaN       NaN        NaN

UPDATE: or much nicer and shorter version from @COLDSPEED:

In [164]: pd.pivot(df.index.year, df.index.month, df['val']) \
            .rename(columns=calendar.month_name.__getitem__)
Out[164]:
date   January  February     March     April       May     June      July    August  September    October  November   December
date
1992  1.123077 -2.174845 -3.884848  8.682919  1.312976  7.85108 -3.192788 -7.351976  -6.782217 -17.182738  3.898782 -26.190414
1993  2.233359  6.709006       NaN       NaN       NaN      NaN       NaN       NaN        NaN        NaN       NaN        NaN
2
votes
s

1992-01-31     1.123077
1992-02-28    -2.174845
1992-03-31    -3.884848
1992-04-30     8.682919
1992-05-29     1.312976
1992-06-30     7.851080
1992-07-31    -3.192788
1992-08-31    -7.351976
1992-09-30    -6.782217
1992-10-30   -17.182738
1992-11-30     3.898782
1992-12-31   -26.190414
1993-01-29     2.233359
1993-02-26     6.709006
Name: 1, dtype: float64

type(s)
pandas.core.series.Series

If necessary, convert the index to datetime -

s.index = pd.to_datetime(s.index, errors='coerce')

Now, use pd.pivot -

x = pd.Series(s.index.strftime('%Y %B')).str.split()
y, m = x.str[0], x.str[1]

pd.pivot(y, m, s)

         April    August   December  February   January      July     June  \
1992  8.682919 -7.351976 -26.190414 -2.174845  1.123077 -3.192788  7.85108   
1993       NaN       NaN        NaN  6.709006  2.233359       NaN      NaN   

         March       May  November    October  September  
1992 -3.884848  1.312976  3.898782 -17.182738  -6.782217  
1993       NaN       NaN       NaN        NaN        NaN  
0
votes

Try something like

#Give your series index a name so that we can reset index and have a new column
your_series.index = your_series.index.rename('Time')
df = your_series.toframe('Values').reset_index()

#Create variables for month and year
df['Month'] = df.Time.dt.month
df['Year'] = df.Time.dt.Year

#Assuming they are unique, create a pivot table
df.pivot('Year','Month','Values')

The months will be numeric. If you want the names of the month you'll have to do

import datetime as dt

df['Month'] = df.Time.date.apply(lambda x: dt.datetime.strftime(x,'%B'))

If your month/year pairs are not unique, then do something like

df.groupby(['Year','Month']).Values.sum().unstack()