1
votes

Currently I have table in this format :

Geo-id Name                            stat Year   index
111500  Anniston-Oxford-Jacksonville     AL 1991    0
111500  Anniston-Oxford-Jacksonville     AL 1992    91.675
111500  Anniston-Oxford-Jacksonville     AL 1993    93.8025
111500  Anniston-Oxford-Jacksonville     AL 1994    96.63
111500  Anniston-Oxford-Jacksonville     AL 1995    99.455
111500  Anniston-Oxford-Jacksonville     AL 1996    102.4875
111500  Anniston-Oxford-Jacksonville     AL 1997    109.0225
111500  Anniston-Oxford-Jacksonville     AL 1998    114.7075
111500  Anniston-Oxford-Jacksonville     AL 1999    116.005
112220  Auburn-Opelika                  AL  1992    90.695
112220  Auburn-Opelika                  AL  1993    94.2075
112220  Auburn-Opelika                  AL  1994    98.6825
112220  Auburn-Opelika                  AL  1995    103.3675
112220  Auburn-Opelika                  AL  1996    107.2725
112220  Auburn-Opelika                  AL  1997    111.7125

This should be converted to :

Geo-id  Name                            1991    1992    1993    1994 ........... 2017
111500  Anniston-Oxford-Jacksonville    0       91.675  93.8025 96.63
112220  Auburn-Opelika                  0       90.695  94.2075 98.6825 and so on .....

Keeping geoid and name vertical ( but repeating only once since it is pivoted horizontally).

Code I have so far :

   pre_horizontal_df = pd.read_csv('database_raw.csv')
   pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
   df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
   print (df1)

But this is not working. Can this horizontal pivot be done in python df / pandas?

1

1 Answers

1
votes

You need add column Year which create new columns by unstack:

df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000  

Another solution with pivot_table:

df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'], 
                                    columns='Year', 
                                    values='index', 
                                    fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville     0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                   0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000  

Last if need index convert to columns:

df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
   Geo-id                          Name  1991    1992     1993     1994  \
0  111500  Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   
1  112220                Auburn-Opelika   0.0  90.695  94.2075  98.6825   

       1995      1996      1997      1998     1999  
0   99.4550  102.4875  109.0225  114.7075  116.005  
1  103.3675  107.2725  111.7125    0.0000    0.000  

EDIT:

If not duplicates per columns created new index and new columns, is possible use set_index with unstack:

print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []

df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year                                 1991    1992     1993     1994      1995  \
Geo-id Name                                                                     
111500 Anniston-Oxford-Jacksonville   0.0  91.675  93.8025  96.6300   99.4550   
112220 Auburn-Opelika                 0.0  90.695  94.2075  98.6825  103.3675   

Year                                     1996      1997      1998     1999  
Geo-id Name                                                                 
111500 Anniston-Oxford-Jacksonville  102.4875  109.0225  114.7075  116.005  
112220 Auburn-Opelika                107.2725  111.7125    0.0000    0.000