# ML之kNN：k最近邻kNN算法的简介、应用、经典案例之详细攻略（二）

+关注继续查看

## 2、K 近邻算法的三要素

K 近邻算法使用的模型实际上对应于对特征空间的划分。K 值的选择，距离度量和分类决策规则是该算法的三个基本要素：

K 值的选择会对算法的结果产生重大影响。K值较小意味着只有与输入实例较近的训练实例才会对预测结果起作用，但容易发生过拟合；如果 K 值较大，优点是可以减少学习的估计误差，但缺点是学习的近似误差增大，这时与输入实例较远的训练实例也会对预测起作用，使预测发生错误。在实际应用中，K 值一般选择一个较小的数值，通常采用交叉验证的方法来选择最优的 K 值。随着训练实例数目趋向于无穷和 K=1 时，误差率不会超过贝叶斯误差率的2倍，如果K也趋向于无穷，则误差率趋向于贝叶斯误差率。

## k最近邻kNN算法的应用

### 1、kNN代码解读

"""Regression based on k-nearest neighbors.

The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.

Read more in the :ref:User Guide <regression>.

Parameters

----------

n_neighbors : int, optional (default = 5)

Number of neighbors to use by default for :meth:kneighbors queries.

weights : str or callable

weight function used in prediction.  Possible values:

- 'uniform' : uniform weights.  All points in each neighborhood are weighted equally.

- 'distance' : weight points by the inverse of their distance.

in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.

- [callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

Uniform weights are used by default.

algorithm : {'auto', 'ball_tree', 'kd_tree', 'brute'}, optional

Algorithm used to compute the nearest neighbors:

- 'ball_tree' will use :class:BallTree

- 'kd_tree' will use :class:KDTree

- 'brute' will use a brute-force search.

- 'auto' will attempt to decide the most appropriate algorithm

based on the values passed to :meth:fit method.

---------

n_neighbors: int，可选(默认= 5)

kneighbors:meth: ' kneighbors查询默认使用的邻居数。

-“均匀”:重量均匀。每个邻域中的所有点的权值都是相等的。

-“距离”:权重点的距离的倒数。

- [callable]:一个用户定义的函数，它接受一个距离数组，并返回一个包含权值的形状相同的数组。

- 'ball_tree'将使用:class: ' BallTree '

- 'kd_tree'将使用:class: ' KDTree '

-“蛮力”将使用蛮力搜索。

- 'auto'将尝试决定最合适的算法

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_size : int, optional (default = 30)

Leaf size passed to BallTree or KDTree.  This can affect the speed of the construction and query, as well as the memory required to store the tree.  The optimal value depends on the nature of the problem.

p : integer, optional (default = 2)

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and  euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metric : string or callable, default 'minkowski'

the distance metric to use for the tree.  The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.

metric_params : dict, optional (default = None)

Additional keyword arguments for the metric function.

n_jobs : int, optional (default = 1)

The number of parallel jobs to run for neighbors search.

If -1, then the number of jobs is set to the number of CPU  cores.

Doesn't affect :meth:fit method.

leaf_size: int，可选(默认值为30)

p:整数，可选(默认= 2)

Minkowski度规的功率参数。当p = 1时，这相当于在p = 2时使用manhattan_distance (l1)和euclidean_distance (l2)。对于任意p，使用minkowski_distance (l_p)。

metric_params: dict，可选(默认= None)

n_jobs: int，可选(默认值为1)

Examples

--------

>>> X = [[0], [1], [2], [3]]

>>> y = [0, 0, 1, 1]

>>> from sklearn.neighbors import KNeighborsRegressor

>>> neigh = KNeighborsRegressor(n_neighbors=2)

>>> neigh.fit(X, y) # doctest: +ELLIPSIS

KNeighborsRegressor(...)

>>> print(neigh.predict([[1.5]]))

[ 0.5]

--------

NearestNeighbors

KNeighborsClassifier

--------

>>> X = [[0]， [1]， [2]， [3]]

>>> y = [0, 0, 1, 1]

>>> neigh = KNeighborsRegressor(n_neighbors=2)

> > >马嘶声。fit(X, y) # doctest: +省略号

KNeighborsRegressor (…)

> > >打印(neigh.predict ([[1.5]]))

[0.5]

--------

NearestNeighbors

KNeighborsClassifier

Notes

-----

See :ref:Nearest Neighbors <neighbors> in the online  documentation for a discussion of the choice of algorithm and leaf_size.

.. warning::

Regarding the Nearest Neighbors algorithms, if it is found that  two neighbors, neighbor k+1 and k`, have identical distances but different labels, the results will depend on the ordering of the training data.

""" 笔记

-----

. .警告::

https://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

”“”

class KNeighborsRegressor Found at: sklearn.neighbors.regression

class KNeighborsRegressor(NeighborsBase, KNeighborsMixin,

SupervisedFloatMixin,

RegressorMixin):

def __init__(self, n_neighbors=5, weights='uniform',

algorithm='auto', leaf_size=30,

p=2, metric='minkowski', metric_params=None, n_jobs=1, **

kwargs):

self._init_params(n_neighbors=n_neighbors,

algorithm=algorithm, leaf_size=leaf_size, metric=metric, p=p,

metric_params=metric_params, n_jobs=n_jobs, **kwargs)

self.weights = _check_weights(weights)

def predict(self, X):

"""Predict the target for the provided data

Parameters

----------

X : array-like, shape (n_query, n_features), \

or (n_query, n_indexed) if metric == 'precomputed'

Test samples.

Returns

-------

y : array of int, shape = [n_samples] or [n_samples, n_outputs]

Target values

"""

X = check_array(X, accept_sparse='csr')

neigh_dist, neigh_ind = self.kneighbors(X)

weights = _get_weights(neigh_dist, self.weights)

_y = self._y

if _y.ndim == 1:

_y = _y.reshape((-1, 1))

if weights is None:

y_pred = np.mean(_y[neigh_ind], axis=1)

else:

y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.

float64)

denom = np.sum(weights, axis=1)

for j in range(_y.shape[1]):

num = np.sum(neigh_indj]_y[ * weights, axis=1)

y_pred[:j] = num / denom

if self._y.ndim == 1:

y_pred = y_pred.ravel()

return y_pred

## k最近邻kNN算法的经典案例

### 1、基础案例

ML之kNN：利用kNN算法对莺尾(Iris)数据集进行多分类预测https://yunyaniu.blog.csdn.net/article/details/87892011

ML之kNN(两种)：基于两种kNN(平均回归、加权回归)对Boston(波士顿房价)数据集(506,13+1)进行价格回归预测并对比各自性能https://yunyaniu.blog.csdn.net/article/details/87913163

CV之kNN：基于ORB提取+kNN检测器、基于SIFT提取+flann检测器的图片相似度可视化https://yunyaniu.blog.csdn.net/article/details/103294339

4051 0

（本文转载自别处） JavaScript事件冒泡简介及应用   一、什么是事件冒泡 在一个对象上触发某类事件（比如单击onclick事件），如果此对象定义了此事件的处理程序，那么此事件就会调用这个处理程序，如果没有定义此事件处理程序或者事件返回true，那么这个事件会向这个对象的父级对象传播，从里到外，直至它被处理（父级对象所有同类事件都将被激活），或者它到达了对象层次的最顶层，即document对象（有些浏览器是window）。
676 0
DL之FasterR-CNN：Faster R-CNN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略（二）
DL之FasterR-CNN：Faster R-CNN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
23 0
DL之FastR-CNN：Fast R-CNN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略( 二)
DL之FastR-CNN：Fast R-CNN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
7 0

4356 0
ML之DR之SVD：SVD算法相关论文、算法过程、代码实现、案例应用之详细攻略
ML之DR之SVD：SVD算法相关论文、算法过程、代码实现、案例应用之详细攻略
16 0
ML之LoR：逻辑回归LoR算法的简介、应用、经典案例之详细攻略
ML之LoR：逻辑回归LoR算法的简介、应用、经典案例之详细攻略
15 0
+关注

1701

0

《SaaS模式云原生数据仓库应用场景实践》

《看见新力量：二》电子书