Efficient kNN classification with different numbers of nearest neighbors

Shichao Zhang; Xuelong Li; Ming Zong; Xiaofeng Zhu; Ruili Wang

doi:10.1109/TNNLS.2017.2673241

Efficient kNN classification with different numbers of nearest neighbors

Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, Ruili Wang

Research output: Journal Publication › Article › peer-review

981 Citations (Scopus)

Abstract

k nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually timeconsuming. This paper proposes a kTree method to learn different optimal k values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal k values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal k values. In the test stage, the kTree fast outputs the optimal k value for each test sample, and then, the kNN classification can be conducted using the learned optimal k value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed k value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different k values to different test samples. This paper further proposes an improvement version of kTree method (namely, k∗Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k∗Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k∗Tree) are much more efficient than the compared methods in terms of classification tasks.

Original language	English
Pages (from-to)	1774-1785
Number of pages	12
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	29
Issue number	5
DOIs	https://doi.org/10.1109/TNNLS.2017.2673241
Publication status	Published - May 2018
Externally published	Yes

Keywords

Decision tree
K nearest neighbor (KNN) classification
Sparse coding

ASJC Scopus subject areas

Software
Computer Science Applications
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1109/TNNLS.2017.2673241

Cite this

@article{f883dcb78e8e46fb8e8052bacb6069bf,

title = "Efficient kNN classification with different numbers of nearest neighbors",

abstract = "k nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually timeconsuming. This paper proposes a kTree method to learn different optimal k values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal k values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal k values. In the test stage, the kTree fast outputs the optimal k value for each test sample, and then, the kNN classification can be conducted using the learned optimal k value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed k value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different k values to different test samples. This paper further proposes an improvement version of kTree method (namely, k∗Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k∗Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k∗Tree) are much more efficient than the compared methods in terms of classification tasks.",

keywords = "Decision tree, K nearest neighbor (KNN) classification, Sparse coding",

author = "Shichao Zhang and Xuelong Li and Ming Zong and Xiaofeng Zhu and Ruili Wang",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2018",

month = may,

doi = "10.1109/TNNLS.2017.2673241",

language = "English",

volume = "29",

pages = "1774--1785",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "5",

}

TY - JOUR

T1 - Efficient kNN classification with different numbers of nearest neighbors

AU - Zhang, Shichao

AU - Li, Xuelong

AU - Zong, Ming

AU - Zhu, Xiaofeng

AU - Wang, Ruili

PY - 2018/5

Y1 - 2018/5

N2 - k nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually timeconsuming. This paper proposes a kTree method to learn different optimal k values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal k values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal k values. In the test stage, the kTree fast outputs the optimal k value for each test sample, and then, the kNN classification can be conducted using the learned optimal k value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed k value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different k values to different test samples. This paper further proposes an improvement version of kTree method (namely, k∗Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k∗Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k∗Tree) are much more efficient than the compared methods in terms of classification tasks.

AB - k nearest neighbor (kNN) method is a popular classification method in data mining and statistics because of its simple implementation and significant classification performance. However, it is impractical for traditional kNN methods to assign a fixed k value (even though set by experts) to all test samples. Previous solutions assign different k values to different test samples by the cross validation method but are usually timeconsuming. This paper proposes a kTree method to learn different optimal k values for different test/new samples, by involving a training stage in the kNN classification. Specifically, in the training stage, kTree method first learns optimal k values for all training samples by a new sparse reconstruction model, and then constructs a decision tree (namely, kTree) using training samples and the learned optimal k values. In the test stage, the kTree fast outputs the optimal k value for each test sample, and then, the kNN classification can be conducted using the learned optimal k value and all training samples. As a result, the proposed kTree method has a similar running cost but higher classification accuracy, compared with traditional kNN methods, which assign a fixed k value to all test samples. Moreover, the proposed kTree method needs less running cost but achieves similar classification accuracy, compared with the newly kNN methods, which assign different k values to different test samples. This paper further proposes an improvement version of kTree method (namely, k∗Tree method) to speed its test stage by extra storing the information of the training samples in the leaf nodes of kTree, such as the training samples located in the leaf nodes, their kNNs, and the nearest neighbor of these kNNs. We call the resulting decision tree as k∗Tree, which enables to conduct kNN classification using a subset of the training samples in the leaf nodes rather than all training samples used in the newly kNN methods. This actually reduces running cost of test stage. Finally, the experimental results on 20 real data sets showed that our proposed methods (i.e., kTree and k∗Tree) are much more efficient than the compared methods in terms of classification tasks.

KW - Decision tree

KW - K nearest neighbor (KNN) classification

KW - Sparse coding

UR - http://www.scopus.com/inward/record.url?scp=85018487453&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2017.2673241

DO - 10.1109/TNNLS.2017.2673241

M3 - Article

C2 - 28422666

AN - SCOPUS:85018487453

SN - 2162-237X

VL - 29

SP - 1774

EP - 1785

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 5

ER -

Efficient kNN classification with different numbers of nearest neighbors

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this