Industrial Training




KNN Algorithm - Finding Nearest Neighbors


Introduction


K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be used for both classification as well as regression predictive problems. However, it is mainly used for classification predictive problems in industry. The following two properties would define KNN well −


  • Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
  • Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.

Working of KNN Algorithm


K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new datapoints which further means that the new data point will be assigned a value based on how closely it matches the points in the training set. We can understand its working with the help of following steps −

Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we must load the training as well as test data.

Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any integer.

Step 3 − For each point in the test data do the following −


  • 3.1 − Calculate the distance between test data and each row of training data with the help of any of the method namely: Euclidean, Manhattan or Hamming distance. The most commonly used method to calculate distance is Euclidean.
  • 3.2 − Now, based on the distance value, sort them in ascending order.
  • 3.3 − Next, it will choose the top K rows from the sorted array.
  • 3.4 − Now, it will assign a class to the test point based on most frequent class of these rows.

Step 4 − End


KNN as Classifier

First, start with importing necessary python packages −

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Next, download the iris dataset from its weblink as follows −

path = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

Next, we need to assign column names to the dataset as follows −

headernames = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

Now, we need to read dataset to pandas dataframe as follows −

dataset = pd.read_csv(path, names = headernames)
dataset.head()

Pros and Cons of KNN

Pros

  • It is very simple algorithm to understand and interpret.
  • It is very useful for nonlinear data because there is no assumption about data in this algorithm.
  • It is a versatile algorithm as we can use it for classification as well as regression.
  • It has relatively high accuracy but there are much better supervised learning models than KNN.

Cons

  • It is computationally a bit expensive algorithm because it stores all the training data.
  • High memory storage required as compared to other supervised learning algorithms.
  • Prediction is slow in case of big N.
  • It is very sensitive to the scale of data as well as irrelevant features.

Applications of KNN

Banking System

KNN can be used in banking system to predict weather an individual is fit for loan approval? Does that individual have the characteristics similar to the defaulters one?

Calculating Credit Ratings

KNN algorithms can be used to find an individual’s credit rating by comparing with the persons having similar traits.

Politics

With the help of KNN algorithms, we can classify a potential voter into various classes like “Will Vote”, “Will not Vote”, “Will Vote to Party ‘Congress’, “Will Vote to Party ‘BJP’.


Other areas in which KNN algorithm can be used are Speech Recognition, Handwriting Detection, Image Recognition and Video Recognition.



Hi I am Pluto.