0
votes

I use the following code to read in a Excel file and plot a boxplot using the seaborn package.

import scipy.stats as sps
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
from openpyxl import load_workbook
sns.set()


inpath=r"P:\Data.xlsx"

df=pd.read_excel(io=inpath,header=0,sheetname="65051045")
df1=df[df["Gel.Menge"]!=0]["Gel.Menge"]
print(df1)
fig2=plt.figure(figsize=(15,10))
sns.boxplot(data=df1)
sns.swarmplot(data=df1,color="black",alpha=0.5)
plt.title("65051045")

the excel table looks like:

Gel.Menge   Erf.datum   Freig.
0,000   26.11.2014  26.11.2014
10,000  06.11.2014  07.11.2014
5,000   19.12.2014  08.01.2015
7,000   07.07.2015  17.07.2015
1,000   21.07.2015  22.07.2015
5,000   18.03.2016  22.03.2016
10,000  29.03.2016  31.03.2016
10,000  20.07.2016  21.07.2016
20,000  13.10.2016  17.10.2016
5,000   01.12.2014  01.12.2014
3,000   20.04.2015  20.04.2015

If I run the code I get the following error message:

KeyError Traceback (most recent call last) in () 84 print(df1) 85 fig2=plt.figure(figsize=(15,10)) ---> 86 sns.boxplot(data=df1) 87 sns.swarmplot(data=df1,color="black",alpha=0.5) 88 plt.title("65051045 - Laserschweißen Getriebeabtrieb Rundnaht")

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, fliersize, linewidth, whis, notch, ax, **kwargs)
2173 plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
2174 orient, color, palette, saturation, -> 2175 width, fliersize, linewidth) 2176 2177 if ax is None:

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in init(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, fliersize, linewidth) 424 width, fliersize, linewidth): 425 --> 426 self.establish_variables(x, y, hue, data, orient, order, hue_order) 427 self.establish_colors(color, palette, saturation) 428

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units) 94 if hasattr(data, "shape"): 95 if len(data.shape) == 1: ---> 96 if np.isscalar(data[0]): 97 plot_data = [data] 98 else:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in getitem(self, key) 599 key = com._apply_if_callable(key, self) 600 try: --> 601 result = self.index.get_value(self, key) 602 603 if not is_scalar(result):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key) 2426 try: 2427
return self._engine.get_value(s, k, -> 2428 tz=getattr(series.dtype, 'tz', None)) 2429 except KeyError as e1: 2430 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4363)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4046)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:13913)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:13857)()

KeyError: 0

Suprisingly the plot(df1) command is working and it plots:

1     10
2      5
3      7
4      1
5      5
6     10
7     10
8     20
9      5
10     3
Name: Gel.Menge, dtype: int64 

What am I doing wrong???

1

1 Answers

0
votes

I assume the issue is that I have defined:

df1=df[df["Gel.Menge"]!=0]["Gel.Menge"]

therewith df1 is no longer a DataFrame and seaborn is going to be confused.

If I change:

df1=df[df["Gel.Menge"]!=0]["Gel.Menge"]

to

df1["Gel.Menge"]=df["Gel.Menge"].where(df["Gel.Menge"]!=0).dropna()

and also explicitly define df1 to ba a DataFrame with:

df1= pd.DataFrame()

the code is working.

The working code looks as follows:

inpath=r"P:\Data.xlsx"
df1=pd.DataFrame()  

df=pd.read_excel(io=inpath,header=0,sheetname="65051045")
df1["Gel.Menge"]=df["Gel.Menge"].where(df["Gel.Menge"]!=0).dropna()
print(df1)
fig2=plt.figure(figsize=(15,10))
sns.boxplot(data=df1)
sns.swarmplot(data=df1,color="black",alpha=0.5)
plt.title("65051045")