0
votes

I have the following source data (which comes from a csv file):

ABC,2016-6-9 0:00,95,"{'//Purple': [115L], '//Yellow': [403L], '//Blue': [16L], '//White-XYZ': [0L]}"
ABC,2016-6-10 0:00,0,"{'//Purple': [219L], '//Yellow': [381L], '//Blue': [90L], '//White-XYZ': [0L]}"
ABC,2016-6-11 0:00,0,"{'//Purple': [817L], '//Yellow': [21L], '//Blue': [31L], '//White-XYZ': [0L]}"
ABC,2016-6-12 0:00,0,"{'//Purple': [80L], '//Yellow': [2011L], '//Blue': [8888L], '//White-XYZ': [0L]}"
ABC,2016-6-13 0:00,0,"{'//Purple': [32L], '//Yellow': [15L], '//Blue': [4L], '//White-XYZ': [0L]}"
DEF,2016-6-16 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [3L]}"
DEF,2016-6-17 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [0L]}"
DEF,2016-6-18 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [7L]}"
DEF,2016-6-19 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [14L]}"
DEF,2016-6-20 0:00,0,"{'//Purple': [32L], '//Black': [15L], '//Pink': [4L], '//NPO-Green': [21L]}"

I use How to remove curly braces, apostrophes and square brackets from dictionaries in a Pandas dataframe (Python) to transform that data into a Data Frame that I can use to plot certain variables. The Data Frame looks as follows (note: not the same data as what's in the source csv file, but the structure is the same):

   Company   Date         Code       Yellow     Blue        White      Black

0  ABC       2016-6-9     115        403         16         19         472     
1  ABC       2016-6-10    219        381         90         20         2474
2  ABC       2016-6-11    817        21          31         88         54
3  ABC       2016-6-12    80         2011        8888       0          21
4  ABC       2016-6-13    21         15          46         20         56
5  DEF       2016-6-16    64         42          76         4          41
6  DEF       2016-6-17    694        13          84         50         986
7  DEF       2016-6-18    325        485         38         60         174
8  DEF       2016-6-19    418        35          174        251        11
9  DEF       2016-6-20    50         56          59         19         03

I need to create several time series plots of the colors (which I can do very easily, given the way that the data frame is constructed).

But, I also want to be able to make a horizontal bar plot as of a specific date (see https://stanford.edu/~mwaskom/software/seaborn/examples/horizontal_barplot.html for an example).

For instance, using my data, as of June 9, 2016, the bar plot would look as follows (not to scale):

Black:    ********************************

Yellow:   **************************

White:    ***

Blue:     **

The problem I'm having is that the column names (e.g. 'yellow', 'blue', 'white' and 'black') can change, as can the number of columns.

Does anyone know if it is possible to loop through a certain number of columns to the right of the 'Code' column and then use those to create a horizontal bar chart similar to what's above? Or, perhaps take a slice of the data to the right of the 'Code' column?

Or, does the Data Frame itself need to be structured differently such that it can be used to make both a time series plot and a horizontal bar chart?

Thanks!

2

2 Answers

0
votes

in order to loop through a certain number of columns to the right of the 'code' column I would do something of the form

for col in df.columns[3:]:
    plot(col)

However this only works if you can guarantee that your columns will always be in the same order. Alternatively you could make sure that the columns of interest for that particular chart are named in a systematic way.

Hope this helps!

0
votes

IIUC you can do it this way:

Original DF:

In [127]: df
Out[127]:
  Company       Date  Code  Yellow  Blue  White  Black
0     ABC 2016-06-09   115     403    16     19    472
1     ABC 2016-06-10   219     381    90     20   2474
2     ABC 2016-06-11   817      21    31     88     54
3     ABC 2016-06-12    80    2011  8888      0     21
4     ABC 2016-06-13    21      15    46     20     56
5     DEF 2016-06-16    64      42    76      4     41
6     DEF 2016-06-17   694      13    84     50    986
7     DEF 2016-06-18   325     485    38     60    174
8     DEF 2016-06-19   418      35   174    251     11
9     DEF 2016-06-20    50      56    59     19      3

set Date as index:

In [128]: df = df.set_index('Date')

In [129]: df
Out[129]:
           Company  Code  Yellow  Blue  White  Black
Date
2016-06-09     ABC   115     403    16     19    472
2016-06-10     ABC   219     381    90     20   2474
2016-06-11     ABC   817      21    31     88     54
2016-06-12     ABC    80    2011  8888      0     21
2016-06-13     ABC    21      15    46     20     56
2016-06-16     DEF    64      42    76      4     41
2016-06-17     DEF   694      13    84     50    986
2016-06-18     DEF   325     485    38     60    174
2016-06-19     DEF   418      35   174    251     11
2016-06-20     DEF    50      56    59     19      3

In [130]: cols = df.drop(['Company','Code'], 1).columns.tolist()

In [131]: cols
Out[131]: ['Yellow', 'Blue', 'White', 'Black']

In [139]: %paste
import matplotlib
matplotlib.style.use('ggplot')

In [140]: df.ix['2016-06-09', cols].plot.barh(rot=0, color=cols)
Out[140]: <matplotlib.axes._subplots.AxesSubplot at 0x1890a898>

enter image description here

or if you want to plot is sorted:

In [142]: srt = df.ix['2016-06-09', cols].sort_values()

In [143]: srt.plot.barh(color=srt.index)
Out[143]: <matplotlib.axes._subplots.AxesSubplot at 0x1cf16748>

enter image description here