4
votes

After transposing my Python Dataframe, I could not access my column name to plot a graph. I want to choose two columns but failed. It keeps saying no such column names. I am pretty new to Python, dataframe and transpose. Could someone help please?

Below is my input file and I want to transpose row to Column. It was successful when I transposed but I could not select "Canada" and "Cameroon" to plot a graph.

    country     1990    1991    1992    1993    1994    1995
0   Cambodia    65.4    65.7    66.2    66.7    67.1    68.4
1   Cameroon    63.9    63.7    64.7    65.6    66.6    67.6
2   Canada      98.6    99.6    99.6    99.8    99.9    99.9
3   Cape Verde  77.7    77.0    76.6    89.0    79.0    78.0
    import pandas as pd
    import numpy as np
    import re 
    import math
    import matplotlib.pyplot as plt

    missing_values=["n/a","na","-","-","N/A"]
    df = pd.read_csv('StackoverflowGap.csv', na_values = missing_values)
    # Transpose
    df = df.transpose()
    plt.figure(figsize=(12,8))
    plt.plot(df['Canada','Cameroon'], linewidth = 0.5)
    plt.title("Time Series for Canada")
    plt.show()

It produces a long list of error messages but the final message is

KeyError: ('Canada', 'Cameroon')

1

1 Answers

3
votes

There a few things you might need to do when working with the data.

  1. If the csv file has no header then use df = pd.read_csv('StackoverflowGap.csv', na_values = missing_values, header = None).
  2. When you transpose, you need to name the columns df.columns= df.iloc[0].
  3. Having done this you need to drop the first row of your table (because it contains the column names) df = df.reindex(df.index.drop(0)).
  4. Finally, when accessing the data by columns (in the plt.plot() command) you need to use df[] on the list of columns, i.e. df[['Canada', 'Cameroon']].

EDIT So the code, as it works for me is as follows

df = pd.read_csv('StackoverflowGap.csv', na_values = missing_values, header = None)
df = df.T
df.columns= df.iloc[0]
df = df.reindex(df.index.drop('country'))
df.index.name = 'Year'
plt.figure(figsize=(12,8))
plt.plot(df[['Canada','Cameroon']], linewidth = 0.5)
plt.title("Time Series for Canada")
plt.show()