1
votes

I have a single df containing the numbers where i'm trying to identify the outliers.

trtbps = [145 130 130 120 120 140 140 120 172 150 140 130 130 110 150 120 120 150 150 140 135 130 140 150 140 160 150 110 140 130 105 120 130 125 125 142 135 150 155 160 140 130 104 130 140 120 140 138 128 138 130 120 130 108 135 134 122 115 118 128 110 108 118 135 140 138 100 130 120 124 120 94 130 140 122 135 125 140 128 105 112 128 102 152 102 115 118 101 110 100 124 132 138 132 112 142 140 108 130 130 148 178 140 120 129 120 160 138 120 110 180 150 140 110 130 120 130 120 105 138 130 138 112 108 94 118 112 152 136 120 160 134 120 110 126 130 120 128 110 128 120 115 120 106 140 156 118 150 120 130 160 112 170 146 138 130 130 122 125 130 120 132 120 138 138 160 120 140 130 140 130 110 120 132 130 110 117 140 120 150 132 150 130 112 150 112 130 124 140 110 130 128 120 145 140 170 150 125 120 110 110 125 150 180 160 128 110 150 120 140 128 120 118 145 125 132 130 130 135 130 150 140 138 200 110 145 120 120 170 125 108 165 160 120 130 140 125 140 125 126 160 174 145 152 132 124 134 160 192 140 140 132 138 100 160 142 128 144 150 120 178 112 123 108 110 112 180 118 122 130 120 134 120 100 110 125 146 124 136 138 136 128 126 152 140 140 134 154 110 128 148 114 170 152 120 140 124 164 140 110 144 130 130]

Using the boxplot I'm able to identify 6 outliers as shown below

Boxplot

However, when i manually try to calculate the outliers using IQR, I'm getting 9 different outliers as shown using below.

#Calculating the IQR
IQR = df.trtbps.quantile(0.75) - df.trtbps.quantile(0.25)

#Calculating the upper and lower boundaries 

lower_bridge=df['trtbps'].quantile(0.25)-(IQR*1.5)
upper_bridge=df['trtbps'].quantile(0.75)+(IQR*1.5)
print(lower_bridge), print(upper_bridge)

#Printing the outliers in trtbps column based on upper and lower boundaries
print(df[(df['trtbps'] > upper_bridge) | (df['trtbps'] < lower_bridge)])

Output :

         trtbps  
8         172   
101       178   
110       180   
203       180  
223       200   
241       174   
248       192   
260       178   
266       180   

Question is why the count of outliers mismatching between boxplot and manually calculated outliers ? Shouldn't the count be the same between both of them ?

1

1 Answers

2
votes

The outliers are the same. You just cannot count them in the boxplot, because three of them have the value 180, and two have the value 178. These two groups will appear as one point each in the plot. This accounts for the three "missing" points.