Comparison of two methods of Image Classification

Comparison of two methods of Image Classification: (KNN) K-Nearest-Neighbor and (SVM) Support Vector Machines
Slvar Abdulazeez-PG1-Computer Vision
Abstract
For a computer or a robot to be able to detect, recognize and classify an object inside an image, a series of processing steps are required to make sense of what the image represents and although this process is simple for human beings, it is fairly hard for computers as images are represented by massive amounts of data that need to be processed before being able to extract the information from.
Two classification methods, namely: K-Nearest Neighbor and Support Vector Machines will be compared in terms of performance and accuracy using a general Bag of Words model.
1 Introduction
Humans’ visual recognition of objects in everyday life is considered as rapid and effortless. And as computer vision is based and inspired from human vision; several researches emerged and were concluded by different types of classifiers which were put into use to help recognize and classify images.
These classification methods that have been developed by various researchers include naïve Bayes classifier, support vector machines, k-nearest neighbors, Gaussian mixture model, decision tree and radial basis function (RBF) classifiers. These classifiers are used in algorithms that involve object recognition. Nevertheless, for a computer to be able to recognize objects is still considered as a challenge as the number of object categories ranges between 10,000 to 30,000 different object categories. Another challenge is the object orientation where it is difficult for a computer to figure out how objects look from different angles whereas the variant illumination where light can cause a problem in detecting objects making the same object look like a different one. Background clutter is also considered a challenge in which classifiers cannot tell the difference between the object and its background there are other challenges which they include (occlusion, scale, intra-class variation and deformation).In computer vision, example of applications which they used for image classifications are (assistive driving, computational photography, surveillance and security)
Bag of Words model is typically considered as one of the most frequently used algorithms used for training classifiers and is used to compare the two classification methods: KNN and SVM.
A bag of word model which’s can be used by any classification method consists of four steps which’s shown in Fig.1 and generally this model is used to create histograms of images that can be using it for classification purpose.
2 Image Representation – Bag of Words Model
The Bag of Words model is considered as one of the most important concepts in computer vision which is used to classify the contents of an image.
Bag of Words algorithm generates a histogram, which is how the visual words found in the test image is distributed, the classifiers then classify the image based on the characteristics that each classifier obtains. This histogram is compared by The KNN classifier to the ones generated from the training images. Whereas, the SVM classifier uses the histogram from a test image and a learned model from the training set to predict a class for the test image. Representing the images as histogram requires feature extraction from the images used to construct the histogram.
To define an image, features must be extracted from the training images and in order to produce a valid recognition, Scale-Invariant Feature Transform descriptors have been used to put the collection of images which are represented as patches into a 128- dimensional vector. These patches are then converted to codewords using k-means clustering which is a method to divide features into k clusters where each feature belongs to the cluster of its nearest mean 10. These feature clusters are used to prepare histogram generation. The classification is tested with 5 different k values or codewords: 50, 100, 250, 500, and 1000. Where the number of codewords is considered as the size of each codebook. As the last step before starting the classification process, image patches are then mapped to a certain codeword which means each image is being represented by a histogram of the codewords.
Classification is then started by training the classifier through learning various features for different categories and in turn forming a dictionary of codewords. The classifier performs image classification by finding the similarities between extracted featured and the codeword dictionary which was produced during the algorithm’s training phase.
3 Classification methods
In order to classify an object, the class or category that the observation or feature belongs to must be determined. This is based on the training dataset which contains observation with a known category or class.
classification works by first plotting training data into multidimensional space. Then each classifier plots testing data into the same multidimensional space as the training data and the data points between testing and training are compared to decide which class the observation belongs to.
3.1 K-Nearest Neighbor Classification
KNN algorithm is regarded as one of the simplest classification algorithms and one of the most used learning algorithms. It is a non-parametric technique that stores all available cases and classifies new cases based on a similarity measure.
The process of training for this algorithm is mainly comprised of keeping feature vectors and labels of the training images stored. Whereas in the classification process, the query point that is unlabeled is matched to the label of its k nearest neighbors.
An object is typically classified by a majority vote of its neighbors, where the object is assigned to the class most common between its K nearest neighbors measured by a distance function. If K = 1, then the object is assigned to the class of its nearest neighbor.
If two classes exist then k must be an odd integer, nevertheless, there can be ties when k is an odd integer while having multiclass classification. Euclidean distance function which is the most common distance function is used after each image is converted to a fixed-length vector where x and y in the equations are the histograms. ?(?,?)=?????= ?(???).(???)=(?((?????)2)??=1)1/2
One of the main advantages of using KNN algorithm is the good performance and the good accuracy exhibited even with multi-modal classes due to it basing its decision on a small neighborhood of similar objects. However, KNN algorithm has a serious disadvantage in which all the features that it has are all equally used when determining similarities. This can lead to classification errors that are majorly noted when there is only a small collection of features that can be regarded as useful input for performing the classification process.
3.2 Support Vector Machine Classification
SVM is a supervised machine learning algorithm. In this algorithm, each data item is plotted as a point in n-dimensional space (where n refers to the number of available observations or features) with each feature value being considered as the value of a particular coordinate. Then, the classification process is performed by finding the hyper-plane that differentiate and extends the margin between the two different classes very well. The generalization error is then reduced for the overall classifier if the separating plane has the largest distance to any of the classes nearest training data point. Each test points or query points are mapped later on into that same space and the category to which it belongs to is predicted based on which side of the gap it is located in. As the overall performance of SVM classifier is usually affected by the cost value, which is considered as a balancing parameter between training errors and margins, different cost parameter values in range of 0.01-10000 can be tested to check which one in range produces the best performance using validation set. The aim of using SVM Classification is mainly producing a model, based on the training data, which will be able to predict class labels of the test data accurately. Given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new
examples. In two-dimensional space this hyperplane is a line dividing a plane in two parts in which each class falls in one of the sides.
One of the main advantages of SVM classification is that SVM is more robust and gives a better performance with datasets that contain many different attributes. It also performs well even if the cases available for the training process are quite few in number. However, this algorithm exhibits some disadvantages in terms of speed limitation problems and the size limitation during training and testing phase of SVM algorithm as well as parameter selection of its kernel function.
4 Classification Results
By using a 5-fold validation set to perform the classification process, where each fold contains roughly about 2800 training images and about 700 testing images. Each conducted experiment contained various different images in both testing and training and testing in order to avoid the conflict or overlapping of testing and training images in each experiment.
The Bag of Words model has been used so as to compare the performance and accuracy of both KNN classification and SVM classification.
With a dataset of around 3500 images from 4 different classes, these images were divided into training and testing images using a ratio of 1 to 5 to increase the variety of used images within each performed experiment.
4.1 KNN Classification results
The results using the KNN classification algorithm shows that the best number of codewords was 50, due to the reason of not being able to classify one of the classes compared to the other three classes. The accuracy of this method dropped whenever the number of codewords was chosen afar from 50. As for the other three classes, the accuracy was higher but not to the point where it compensated for the lost accuracy for the first class. With 5 different k values to test, ranging from 1, 5, 10, 15 and 20. The best value observed was 5.
4.2 SVM Classification results
The best results obtained using SVM classification were 500 codewords, whereas the best recorded cost value was 10000 with an accuracy of around 92%. SVM classification method in comparison to KNN classification showed better performance and accuracy rate.
Below tables show the average performance for KNN and SVM methods:
5 Conclusion
After using two different classification methods and conducting the training and testing phases. It was observed that SVM classification method showed better results in terms of performance and accuracy compared to KNN classification method. Which means the feature extraction for SVM method was successful in producing the desired result unlike the result produced by KNN classification.
Classification methods performance and accuracy can be varied and dependent on the features and characteristics of the data to be classified. KNN is sensitive to bad features and needs to be carefully tuned where choosing k metric to be used can be critical to performance. SVM on the other hand are insensitive to overtraining which shows that it is capable of dealing with unbalanced and large amount of data and in turn, producing more accurate results and converge to a better solution in a faster pace compared to KNN classification method.