Python Flower Classification
The competition goal is to predict the flower name on the bases of data provided in dataset (iris.csv) file. This file contain five column i.e. SepalLength, SepalWidth, PetalLength, PetalWidth and Name which has to be predicted by K Nearest Neighbour (K-NN) Classification.
We use K-NN classification because it is used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:
Firstly, we import necessary library(numpy, matplotlib and pandas) for this model.
import pandas as pd import numpy as np import math import operator
data = pd.read_csv('iris.csv')
print(data.head(5)) data.shape
SepalLength SepalWidth PetalLength PetalWidth Name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa
(150, 5)
Calculate the distance between test data and each row of training data. Here we will use Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.
def euclideanDistance(data1, data2, length): distance = 0 for x in range(length): distance += np.square(data1[x] - data2[x]) return np.sqrt(distance)
In this function first we Calculating euclidean distance between each row of training data and test data. Secondly we Sorting them on the basis of distance then we Extracting top k neighbors after that Calculating the most freq class in the neighbors.
def knn(trainingSet, testInstance, k): distances = {} sort = {} length = testInstance.shape[1] for x in range(len(trainingSet)): dist = euclideanDistance(testInstance, trainingSet.iloc[x], length) distances[x] = dist[0] sorted_d = sorted(distances.items(), key=operator.itemgetter(1)) neighbors = [] for x in range(k): neighbors.append(sorted_d[x][0]) classVotes = {} for x in range(len(neighbors)): response = trainingSet.iloc[neighbors[x]][-1] if response in classVotes: classVotes[response] += 1 else: classVotes[response] = 1 sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse=True) return(sortedVotes[0][0], neighbors)
Here we input data for predicting flower this input data is on the bases of SepalLength, SepalWidth, PetalLength and PetalWidth of flower.
testSet = [[7.2, 3.6, 5.1, 2.5]] test = pd.DataFrame(testSet)
Initialise the value of k i.e. 1 now
print('\n\nWith 1 Nearest Neighbour \n\n') k = 1 # Running KNN model result,neigh = knn(data, test, k)
With 1 Nearest Neighbour
Here the prediction of flower is done by the mean of K-NN Classification.
print('\nPredicted Class of the datapoint = ', result)
Predicted Class of the datapoint = Iris-virginica
We can see the nearest neighbour of the data point as we enter k=1 thats why we get one data point here.
print('\nNearest Neighbour of the datapoints = ',neigh)
Nearest Neighbour of the datapoints = [141]