• For any query, contact us at
  • +91-9872993883
  • +91-8283824812
  • info@ris-ai.com

Detect and Remove Outliers using Python

Outlier detection is a detection technique in which odd one is thrown out.There are different types of techniques and the above used is IQR technique(inter quartile range)IQR(Q3-Q1). It has two quartile ranges: Lower quartile range Upper quartile range Here anything that lies outside lower and upper quartile range then it is considered as outlier detection. There are other types of outlier detection : 1.Inter quartile range 2.z-score 3.DBSCAN and many more. It is used in real life examples as well and one of them is brain tumor detection and cancer detection as well.

Detect and Remove Outliers using Python
In [7]:
import pandas as pd
import numpy as np
BIKE = pandas.read_csv("Bike.csv")
numeric_col = ['temp','hum','windspeed']
categorical_col = ['season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit']
In [4]:
numeric_col = ['temp','hum','windspeed']
categorical_col = ['season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday', 'weathersit']
In [5]:
BIKE.boxplot(numeric_col)
Out[5]:
<AxesSubplot:>
Outliers Box Plot with Python
In [2]:
pip install matplotlib
Collecting matplotlib
  Using cached https://files.pythonhosted.org/packages/09/03/b7b30fa81cb687d1178e085d0f01111ceaea3bf81f9330c937fb6f6c8ca0/matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/a7/1b/cbd8ae738719b5f41592a12057ef5442e2ed5f5cb5451f8fc7e9f8875a1a/kiwisolver-1.3.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting cycler>=0.10 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting python-dateutil>=2.1 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl
Collecting numpy>=1.15 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl
Collecting pillow>=6.2.0 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/df/74/4a981d12fa26b83c9230b67dee44d1361a372e0f22785f093969fd98b964/Pillow-8.3.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Collecting six (from cycler>=0.10->matplotlib)
  Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
Installing collected packages: kiwisolver, six, cycler, python-dateutil, numpy, pyparsing, pillow, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.3.1 matplotlib-3.3.4 numpy-1.19.5 pillow-8.3.1 pyparsing-2.4.7 python-dateutil-2.8.1 six-1.16.0
Note: you may need to restart the kernel to use updated packages.
In [ ]:
for x in ['windspeed']:
    q75,q25 = np.percentile(BIKE.loc[:,x],[75,25])
    intr_qr = q75-q25
 
    max = q75+(1.5*intr_qr)
    min = q25-(1.5*intr_qr)
 
    BIKE.loc[BIKE[x] < min,x] = np.nan
    BIKE.loc[BIKE[x] > max,x] = np.nan
In [9]:
BIKE.isnull().sum()
Out[9]:
temp            0
hum             0
windspeed       3
cnt             0
season_1        0
season_2        0
season_3        0
season_4        0
yr_0            0
yr_1            0
mnth_1          0
mnth_10         0
mnth_11         0
mnth_12         0
mnth_2          0
mnth_3          0
mnth_4          0
mnth_5          0
mnth_6          0
mnth_7          0
mnth_8          0
mnth_9          0
weathersit_1    0
weathersit_2    0
weathersit_3    0
holiday_0       0
holiday_1       0
dtype: int64
In [10]:
BIKE = BIKE.dropna(axis = 0)
In [11]:
BIKE.isnull().sum()
Out[11]:
temp            0
hum             0
windspeed       0
cnt             0
season_1        0
season_2        0
season_3        0
season_4        0
yr_0            0
yr_1            0
mnth_1          0
mnth_10         0
mnth_11         0
mnth_12         0
mnth_2          0
mnth_3          0
mnth_4          0
mnth_5          0
mnth_6          0
mnth_7          0
mnth_8          0
mnth_9          0
weathersit_1    0
weathersit_2    0
weathersit_3    0
holiday_0       0
holiday_1       0
dtype: int64
In [ ]:

Resources You Will Ever Need