knn算法python理解与预测_理解KNN算法

KNN主要包括训练过程和分类过程。在训练过程上,需要将训练集存储起来。在分类过程中,将测试集和训练集中的每一张图片去比较,选取差别最小的那张图片。

如果数据集多,就把训练集分成两部分,一小部分作为验证集(假的测试集),剩下的都为训练集(一般来说是70%-90%,具体多少取决于需要调整的超参数的多少,如果超参数多,验证集占比就更大一点)。验证集的好处是用来调节超参数,如果数据集不多,使用交叉验证的方法来调节参数。但是交叉验证的代价比较高,K折交叉验证,K越大越好,但是代价也更高。

决策分类

明确K个邻居中所有数据类别的个数,将测试数据划分给个数最多的那一类。即由输入实例的 K 个最临近的训练实例中的多数类决定输入实例的类别。

常用决策规则:

多数表决法:多数表决法和我们日常生活中的投票表决是一样的,少数服从多数,是最常用的一种方法。

加权表决法:有些情况下会使用到加权表决法,比如投票的时候裁判投票的权重更大,而一般人的权重较小。所以在数据之间有权重的情况下,一般采用加权表决法。

优点:

所选择的邻居都是已经正确分类的对象

KNN算法本身比较简单,分类器不需要使用训练集进行训练,训练时间复杂度为0。本算法分类的复杂度与训练集中数据的个数成正比。

对于类域的交叉或重叠较多的待分类样本,KNN算法比其他方法跟合适。

缺点:

当样本分布不平衡时,很难做到正确分类

计算量较大,因为每次都要计算测试数据到全部数据的距离。

python代码实现:

import numpy as np

class kNearestNeighbor:

def init(self):

pass

def train(self, X, y):

self.Xtr = X

self.ytr = y

def predict(self, X, k=1):

num_test = X.shape[0]

Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

for i in range(num_test):

distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)

closest_y = y_train[np.argsort(distances)[:k]]

u, indices = np.unique(closest_y, return_inverse=True)

Ypred[i] = u[np.argmax(np.bincount(indices))]

return Ypred

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

load_CIFAR_batch()和load_CIFAR10()是用来加载CIFAR-10数据集的

import pickle

def load_CIFAR_batch(filename):

“”" load single batch of cifar “”"

with open(filename, ‘rb’) as f:

datadict = pickle.load(f, encoding=‘latin1’)

X = datadict[‘data’]

Y = datadict[‘labels’]

X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype(“float”)

Y = np.array(Y)

return X, Y

1

2

3

4

5

6

7

8

9

10

import os

def load_CIFAR10(ROOT):

“”" load all of cifar “”"

xs = []

ys = []

for b in range(1,6):

f = os.path.join(ROOT, ‘data_batch_%d’ %(b))

X, Y = load_CIFAR_batch(f)

xs.append(X)

ys.append(Y)

Xtr = np.concatenate(xs) #使变成行向量

Ytr = np.concatenate(ys)

del X,Y

Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, ‘test_batch’))

return Xtr, Ytr, Xte, Yte

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Xtr, Ytr, Xte, Yte = load_CIFAR10(‘cifar10’)

Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)

Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3)

1

2

3

#由于数据集稍微有点大,在电脑上跑的很慢,所以取训练集5000个,测试集500个

num_training = 5000

num_test = 500

x_train = Xtr_rows[:num_training, :]

y_train = Ytr[:num_training]

x_test = Xte_rows[:num_test, :]

y_test = Yte[:num_test]

1

2

3

4

5

6

7

8

9

knn = kNearestNeighbor()

knn.train(x_train, y_train)

y_predict = knn.predict(x_test, k=7)

acc = np.mean(y_predict == y_test)

print(‘accuracy : %f’ %(acc))

1

2

3

4

5

accuracy : 0.302000

1

#k值取什么最后的效果会更好呢?可以使用交叉验证的方法,这里使用的是5折交叉验证

num_folds = 5

k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

x_train_folds = np.array_split(x_train, num_folds)

y_train_folds = np.array_split(y_train, num_folds)

k_to_accuracies = {}

for k_val in k_choices:

print('k = ’ + str(k_val))

k_to_accuracies[k_val] = []

for i in range(num_folds):

x_train_cycle = np.concatenate([f for j,f in enumerate (x_train_folds) if j!=i])

y_train_cycle = np.concatenate([f for j,f in enumerate (y_train_folds) if j!=i])

x_val_cycle = x_train_folds[i]

y_val_cycle = y_train_folds[i]

knn = kNearestNeighbor()

knn.train(x_train_cycle, y_train_cycle)

y_val_pred = knn.predict(x_val_cycle, k_val)

num_correct = np.sum(y_val_cycle == y_val_pred)

k_to_accuracies[k_val].append(float(num_correct) / float(len(y_val_cycle)))

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

k = 1

k = 3

k = 5

k = 8

k = 10

k = 12

k = 15

k = 20

k = 50

k = 100

1

2

3

4

5

6

7

8

9

10

for k in sorted(k_to_accuracies):

for accuracy in k_to_accuracies[k]:

print(‘k = %d, accuracy = %f’ % (int(k), accuracy))

1

2

3

k = 1, accuracy = 0.098000

k = 1, accuracy = 0.148000

k = 1, accuracy = 0.205000

k = 1, accuracy = 0.233000

k = 1, accuracy = 0.308000

k = 3, accuracy = 0.089000

k = 3, accuracy = 0.142000

k = 3, accuracy = 0.215000

k = 3, accuracy = 0.251000

k = 3, accuracy = 0.296000

k = 5, accuracy = 0.096000

k = 5, accuracy = 0.176000

k = 5, accuracy = 0.240000

k = 5, accuracy = 0.284000

k = 5, accuracy = 0.309000

k = 8, accuracy = 0.100000

k = 8, accuracy = 0.175000

k = 8, accuracy = 0.263000

k = 8, accuracy = 0.289000

k = 8, accuracy = 0.310000

k = 10, accuracy = 0.099000

k = 10, accuracy = 0.174000

k = 10, accuracy = 0.264000

k = 10, accuracy = 0.318000

k = 10, accuracy = 0.313000

k = 12, accuracy = 0.100000

k = 12, accuracy = 0.192000

k = 12, accuracy = 0.261000

k = 12, accuracy = 0.316000

k = 12, accuracy = 0.318000

k = 15, accuracy = 0.087000

k = 15, accuracy = 0.197000

k = 15, accuracy = 0.255000

k = 15, accuracy = 0.322000

k = 15, accuracy = 0.321000

k = 20, accuracy = 0.089000

k = 20, accuracy = 0.225000

k = 20, accuracy = 0.270000

k = 20, accuracy = 0.319000

k = 20, accuracy = 0.306000

k = 50, accuracy = 0.079000

k = 50, accuracy = 0.248000

k = 50, accuracy = 0.278000

k = 50, accuracy = 0.287000

k = 50, accuracy = 0.293000

k = 100, accuracy = 0.075000

k = 100, accuracy = 0.246000

k = 100, accuracy = 0.275000

k = 100, accuracy = 0.284000

k = 100, accuracy = 0.277000

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

可视化交叉验证的结果

import matplotlib.pyplot as plt

plt.rcParams[‘figure.figsize’] = (10.0, 8.0)

plt.rcParams[‘image.interpolation’] = ‘nearest’

plt.rcParams[‘image.cmap’] = ‘gray’

1

2

3

4

5

for k in k_choices:

accuracies = k_to_accuracies[k]

plt.scatter([k] * len(accuracies), accuracies)

accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])

accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])

plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)

plt.title(‘Cross-validation on k’)

plt.xlabel(‘k’)

plt.ylabel(‘Cross-validation accuracy’)

plt.show()

1

2

3

4

5

6

7

8

9

10

11

bde1000a4eb25fceef241b1778be7678.png

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/441965.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【POJ-3259】 Wormholes(判负环,spfa算法)

题干: While exploring his many farms, Farmer John has discovered a number of amazing wormholes. A wormhole is very peculiar because it is a one-way path that delivers you to its destination at a time that is BEFORE you entered the wormhole! Eac…

【HihoCoder - 1550】顺序三元组(思维)

题干&#xff1a; 给定一个长度为N的数组A[A1, A2, ... AN]&#xff0c;已知其中每个元素Ai的值都只可能是1, 2或者3。 请求出有多少下标三元组(i, j, k)满足1 ≤ i < j < k ≤ N且Ai < Aj < Ak。 Input 第一行包含一个整数N 第二行包含N个整数A1, A2, ... …

joptionpane java_Java JOptionPane

Java JOptionPane1 Java JOptionPane的介绍JOptionPane类用于提供标准对话框&#xff0c;例如消息对话框&#xff0c;确认对话框和输入对话框。这些对话框用于显示信息或从用户那里获取输入。JOptionPane类继承了JComponent类。2 Java JOptionPane的声明public class JOptionPa…

【POJ - 3268 】Silver Cow Party(Dijkstra最短路+思维)

题干&#xff1a; One cow from each of N farms (1 ≤ N ≤ 1000) conveniently numbered 1..N is going to attend the big cow party to be held at farm #X (1 ≤ X ≤ N). A total of M (1 ≤ M ≤ 100,000) unidirectional (one-way roads connects pairs of farms; roa…

【HDU - 3342】Legal or Not(拓扑排序)

题干&#xff1a; ACM-DIY is a large QQ group where many excellent acmers get together. It is so harmonious that just like a big family. Every day,many "holy cows" like HH, hh, AC, ZT, lcc, BF, Qinz and so on chat on-line to exchange their ideas.…

java 股票 代码_Java中利用散列表实现股票行情的查询_java

---- 在java中&#xff0c;提供了一个散列表类Hashtable&#xff0c;利用该类&#xff0c;我们可以按照特定的方式来存储数据&#xff0c;从而达到快速检索的目的。本文以查询股票的收盘数据为例&#xff0c;详细地说明java中散列表的使用方法。一、散列表的原理---- 散列表&am…

deepin部署python开发环境_deepin系统下部署Python3.5的开发及运行环境

deepin系统下部署Python3.5的开发及运行环境1 概述本人小白一枚&#xff0c;由于最近要学习python接口自动化测试&#xff0c;所以记录一下相关学习经过及经验&#xff0c;希望对跟我一样小白的朋友可以有所帮助。2 下载在python官网下载指定平台下的python3.5的环境wget https…

【HDU - 2066】:一个人的旅行(Dijkstra算法)

题干&#xff1a; 虽然草儿是个路痴&#xff08;就是在杭电待了一年多&#xff0c;居然还会在校园里迷路的人&#xff0c;汗~),但是草儿仍然很喜欢旅行&#xff0c;因为在旅途中 会遇见很多人&#xff08;白马王子&#xff0c;^0^&#xff09;&#xff0c;很多事&#xff0c;还…

for相关 java_Java学习之for循环相关知识梳理

for循环是编程语言中一种循环语句&#xff0c;是Java程序员日常工作中的重要组成部分。循环语句由循环体及循环的判定条件两部分组成&#xff0c;其表达式为&#xff1a;for(单次表达式;条件表达式;末尾循环体){中间循环体&#xff1b;}。拉勾IT课小编为大家分析如何使用这一属…

【HDU - 3790】最短路径问题(DIjkstra算法 双权值)

题干&#xff1a; 给你n个点&#xff0c;m条无向边&#xff0c;每条边都有长度d和花费p&#xff0c;给你起点s终点t&#xff0c;要求输出起点到终点的最短距离及其花费&#xff0c;如果最短距离有多条路线&#xff0c;则输出花费最少的。 Input 输入n,m&#xff0c;点的编号…

【POJ - 3037】Skiing (Dijkstra算法)

题干&#xff1a; Bessie and the rest of Farmer Johns cows are taking a trip this winter to go skiing. One day Bessie finds herself at the top left corner of an R (1 < R < 100) by C (1 < C < 100) grid of elevations E (-25 < E < 25). In or…

java web权限设计数据权限范围_JavaWeb 角色权限控制——数据库设计

相信各位读者对于角色权限管理这个需求并不陌生。那么是怎么实现的呢&#xff1f;今天小编来说道说道&#xff01;1、首先我们来进行数据库的设计&#xff0c;如何设计数据库是实现权限控制的关键&#xff1a;1)用户表&#xff1a;id&#xff1a;主键、自增、intname&#xff1…

modbus与硬件对接Java_java中modbus协议连接

modbus在java中的使用&#xff0c;首先maven的pom中引入modbus4j包com.infiniteautomationmodbus4j3.0.32. 我们创建类&#xff1a;ModBus4JTCPClient&#xff0c;创建ModbusMaster连接对象&#xff0c;以及读取寄存器方法package io.powerx.test;import org.apache.commons.la…

【51Nod-1100】 斜率最大(贪心)☆双排序

题干&#xff1a; 平面上有N个点&#xff0c;任意2个点确定一条直线&#xff0c;求出所有这些直线中&#xff0c;斜率最大的那条直线所通过的两个点。 &#xff08;点的编号为1-N&#xff0c;如果有多条直线斜率相等&#xff0c;则输出所有结果&#xff0c;按照点的X轴坐标排…

【HDU - 3714 】Error Curves (三分)

题干&#xff1a; Josephina is a clever girl and addicted to Machine Learning recently. She pays much attention to a method called Linear Discriminant Analysis, which has many interesting properties. In order to test the algorithms efficiency, she colle…

java中JLabel添加监听事件_[求助]关于JLabel添加监听器的问题。请各位帮忙!!

[求助]关于JLabel添加监听器的问题。请各位帮忙&#xff01;&#xff01;如图&#xff0c;我想在左边的JLabel上添加事件监听器&#xff0c;然后再去右边的JPane上进行绘制图形&#xff0c;请问这个事件监听器改怎么加&#xff0c;好象不能加ActionListener&#xff0c;要加什么…

指数循环节证明

还有关键的一步忘写了phi(m)>r的注意因为ma^r*m‘’所以phi(m)>phi(a^r)>r,所以就相当于phi(m)为循环节&#xff0c;不过如果指数小于phi(m)只能直接算了。。 注意这里的m与a^r是互质的上面忘写了。。 转自https://blog.csdn.net/guoshiyuan484/article/details/787…

java语言中的类可以_java 语言中的类

类一、类类是具有相同性质的一类事物的总称, 它是一个抽象的概念。它封装了一类对象的状态和方法, 是创建对象的模板。类的实现包括两部分: 类声明和类体类的声明类声明的基本格式为:[ 访问权限修饰符]c l a s s类名[extends超类][ implments实现的接口列表]{}说 明:① 访问权限…

【HDU - 1546】 Idiomatic Phrases Game(Dijkstra,可选map处理字符串)

题干&#xff1a; Tom is playing a game called Idiomatic Phrases Game. An idiom consists of several Chinese characters and has a certain meaning. This game will give Tom two idioms. He should build a list of idioms and the list starts and ends with the two…

Java迭代器修改链表_Java恼人的迭代器不会返回链表中的元素

给出以下代码&#xff1a;public void insertIntoQueue(float length,int xElement,int yElement,int whichElement){Dot dot new Dot(xElement,yElement);GeometricElement element null;// some codeint robotX,robotY;boolean flag false;for (Iterator i robotList.ite…