TensorFlow 机器学习秘籍第二版：1~5（4）-阿里云开发者社区

TensorFlow 机器学习秘籍第二版：1~5（3）https://developer.aliyun.com/article/1426833

准备

可以将相同的最大边际概念应用于拟合线性回归。我们可以考虑最大化包含最多（x，y）点的边距，而不是最大化分隔类的边距。为了说明这一点，我们将使用相同的鸢尾数据集，并表明我们可以使用此概念来拟合萼片长度和花瓣宽度之间的线。

相应的损失函数类似于：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1MiHrztw-1681566813072)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/fa49eabe-ad86-4314-8e13-ed19e7e20b27.png)]

这里，ε是边距宽度的一半，如果一个点位于该区域，则损失等于 0。

操作步骤

我们按如下方式处理秘籍：

首先，我们加载必要的库，启动图，然后加载鸢尾数据集。之后，我们将数据集拆分为训练集和测试集，以显示两者的损失。使用以下代码：

import matplotlib.pyplot as plt 
import numpy as np 
import tensorflow as tf 
from sklearn import datasets 
sess = tf.Session() 
iris = datasets.load_iris() 
x_vals = np.array([x[3] for x in iris.data]) 
y_vals = np.array([y[0] for y in iris.data]) 
train_indices = np.random.choice(len(x_vals), round(len(x_vals)*0.8), replace=False) 
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices))) 
x_vals_train = x_vals[train_indices] 
x_vals_test = x_vals[test_indices] 
y_vals_train = y_vals[train_indices] 
y_vals_test = y_vals[test_indices]

对于此示例，我们将数据拆分为训练集和测试集。将数据拆分为三个数据集也很常见，其中包括验证集。我们可以使用此验证集来验证我们在训练它们时不会过拟合模型。

让我们声明我们的批量大小，占位符和变量，并创建我们的线性模型，如下所示：

batch_size = 50 
x_data = tf.placeholder(shape=[None, 1], dtype=tf.float32) 
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32) 
A = tf.Variable(tf.random_normal(shape=[1,1])) 
b = tf.Variable(tf.random_normal(shape=[1,1])) 
model_output = tf.add(tf.matmul(x_data, A), b)

现在，我们宣布我们的损失函数。如前文所述，损失函数实现为ε = 0.5。请记住，epsilon 是我们的损失函数的一部分，它允许软边距而不是硬边距：

epsilon = tf.constant([0.5]) 
loss = tf.reduce_mean(tf.maximum(0., tf.subtract(tf.abs(tf.subtract(model_output, y_target)), epsilon)))

我们创建一个优化器并接下来初始化我们的变量，如下所示：

my_opt = tf.train.GradientDescentOptimizer(0.075) 
train_step = my_opt.minimize(loss) 
init = tf.global_variables_initializer() 
sess.run(init)

现在，我们迭代 200 次训练迭代并保存训练和测试损失以便以后绘图：

train_loss = [] 
test_loss = [] 
for i in range(200): 
    rand_index = np.random.choice(len(x_vals_train), size=batch_size) 
    rand_x = np.transpose([x_vals_train[rand_index]]) 
    rand_y = np.transpose([y_vals_train[rand_index]]) 
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y}) 
    temp_train_loss = sess.run(loss, feed_dict={x_data: np.transpose([x_vals_train]), y_target: np.transpose([y_vals_train])}) 
    train_loss.append(temp_train_loss) 
    temp_test_loss = sess.run(loss, feed_dict={x_data: np.transpose([x_vals_test]), y_target: np.transpose([y_vals_test])}) 
    test_loss.append(temp_test_loss) 
    if (i+1)%50==0: 
        print('-----------') 
        print('Generation: ' + str(i)) 
        print('A = ' + str(sess.run(A)) + ' b = ' + str(sess.run(b))) 
        print('Train Loss = ' + str(temp_train_loss)) 
        print('Test Loss = ' + str(temp_test_loss))

这产生以下输出：

Generation: 50 
A = [[ 2.20651722]] b = [[ 2.71290684]] 
Train Loss = 0.609453 
Test Loss = 0.460152 
----------- 
Generation: 100 
A = [[ 1.6440177]] b = [[ 3.75240564]] 
Train Loss = 0.242519 
Test Loss = 0.208901 
----------- 
Generation: 150 
A = [[ 1.27711761]] b = [[ 4.3149066]] 
Train Loss = 0.108192 
Test Loss = 0.119284 
----------- 
Generation: 200 
A = [[ 1.05271816]] b = [[ 4.53690529]] 
Train Loss = 0.0799957 
Test Loss = 0.107551

我们现在可以提取我们找到的系数，并获得最佳拟合线的值。出于绘图目的，我们也将获得边距的值。使用以下代码：

[[slope]] = sess.run(A) 
[[y_intercept]] = sess.run(b) 
[width] = sess.run(epsilon) 
best_fit = [] 
best_fit_upper = [] 
best_fit_lower = [] 
for i in x_vals: 
  best_fit.append(slope*i+y_intercept) 
  best_fit_upper.append(slope*i+y_intercept+width) 
  best_fit_lower.append(slope*i+y_intercept-width)

最后，这里是用拟合线和训练测试损失绘制数据的代码：

plt.plot(x_vals, y_vals, 'o', label='Data Points') 
plt.plot(x_vals, best_fit, 'r-', label='SVM Regression Line', linewidth=3) 
plt.plot(x_vals, best_fit_upper, 'r--', linewidth=2) 
plt.plot(x_vals, best_fit_lower, 'r--', linewidth=2) 
plt.ylim([0, 10]) 
plt.legend(loc='lower right') 
plt.title('Sepal Length vs Petal Width') 
plt.xlabel('Petal Width') 
plt.ylabel('Sepal Length') 
plt.show() 
plt.plot(train_loss, 'k-', label='Train Set Loss') 
plt.plot(test_loss, 'r--', label='Test Set Loss') 
plt.title('L2 Loss per Generation') 
plt.xlabel('Generation') 
plt.ylabel('L2 Loss') 
plt.legend(loc='upper right') 
plt.show()

上述代码的图如下：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DTa4ONFf-1681566813072)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/9e656a1b-1414-49d5-a4e8-e2e2d1907737.png)]

图 5：鸢尾数据上有 0.5 个边缘的 SVM 回归（萼片长度与花瓣宽度）

以下是训练迭代中的训练和测试损失：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zZmKiUpM-1681566813072)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/281870d6-9835-4fdd-8db0-8e861aa474d5.png)]

图 6：训练和测试集上每代的 SVM 回归损失

工作原理

直觉上，我们可以将 SVM 回归看作是一个函数，试图尽可能多地在2ε宽度范围内拟合点。该线的拟合对该参数有些敏感。如果我们选择太小的ε，算法将无法适应边距中的许多点。如果我们选择太大的ε，将会有许多行能够适应边距中的所有数据点。我们更喜欢较小的ε，因为距离边缘较近的点比较远的点贡献较少的损失。

在 TensorFlow 中使用核

先前的 SVM 使用线性可分数据。如果我们分离非线性数据，我们可以改变将线性分隔符投影到数据上的方式。这是通过更改 SVM 损失函数中的核来完成的。在本章中，我们将介绍如何更改核并分离非线性可分离数据。

准备

在本文中，我们将激励支持向量机中核的使用。在线性 SVM 部分，我们用特定的损失函数求解了软边界。这种方法的另一种方法是解决所谓的优化问题的对偶。可以证明线性 SVM 问题的对偶性由以下公式给出：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YzOkTYPz-1681566813073)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/a311abb6-b9c9-4aef-9cac-88438abb5879.png)]

对此，以下适用：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ov1LE5R7-1681566813073)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/6b8e7a37-0f7a-40e2-9135-838f6ea38141.png)]

这里，模型中的变量将是b向量。理想情况下，此向量将非常稀疏，仅对我们数据集的相应支持向量采用接近 1 和 -1 的值。我们的数据点向量由x[i]表示，我们的目标（1 或 -1）y[i]表示。

前面等式中的核是点积x[i] · y[j]，它给出了线性核。该核是一个方形矩阵，填充了数据点i, j的点积。

我们可以将更复杂的函数扩展到更高的维度，而不是仅仅在数据点之间进行点积，而在这些维度中，类可以是线性可分的。这似乎是不必要的复杂，但我们可以选择一个具有以下属性的函数k：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lsIdQNbd-1681566813073)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/eb618f1d-e0d3-49a7-bbec-c8e5ab9049d4.png)]

这里, k被称为核函数。更常见的核是使用高斯核（也称为径向基函数核或 RBF 核）。该核用以下等式描述：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GwTLCKE9-1681566813073)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/91bc4d05-6bc2-4b0d-85ac-70a964eae981.png)]

为了对这个核进行预测，比如说p[i]，我们只需在核中的相应方程中用预测点替换，如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-arMgVsrJ-1681566813073)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/284cc0b5-9a95-4a09-bf23-39776df87409.png)]

在本节中，我们将讨论如何实现高斯核。我们还将在适当的位置记下在何处替换实现线性核。我们将使用的数据集将手动创建，以显示高斯核更适合在线性核上使用的位置。

操作步骤

我们按如下方式处理秘籍：

首先，我们加载必要的库并启动图会话，如下所示：

import matplotlib.pyplot as plt 
import numpy as np 
import tensorflow as tf 
from sklearn import datasets 
sess = tf.Session()

现在，我们生成数据。我们将生成的数据将是两个同心数据环；每个戒指都属于不同的阶级。我们必须确保类只有 -1 或 1。然后我们将数据分成每个类的x和y值以用于绘图目的。为此，请使用以下代码：

(x_vals, y_vals) = datasets.make_circles(n_samples=500, factor=.5, noise=.1) 
y_vals = np.array([1 if y==1 else -1 for y in y_vals]) 
class1_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==1] 
class1_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==1] 
class2_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==-1] 
class2_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==-1]

接下来，我们声明批量大小和占位符，并创建我们的模型变量b。对于 SVM，我们倾向于需要更大的批量大小，因为我们需要一个非常稳定的模型，该模型在每次训练生成时都不会波动很大。另请注意，我们为预测点添加了额外的占位符。为了可视化结果，我们将创建一个颜色网格，以查看哪些区域最后属于哪个类。我们这样做如下：

batch_size = 250 
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32) 
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
b = tf.Variable(tf.random_normal(shape=[1,batch_size]))

我们现在将创建高斯核。该核可以表示为矩阵运算，如下所示：

gamma = tf.constant(-50.0) 
dist = tf.reduce_sum(tf.square(x_data), 1) 
dist = tf.reshape(dist, [-1,1]) 
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))), tf.transpose(dist)) 
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

注意add和subtract操作的sq_dists行中广播的使用。另外，请注意线性核可以表示为my_kernel = tf.matmul(x_data, tf.transpose(x_data))。

现在，我们宣布了本秘籍中之前所述的双重问题。最后，我们将使用tf.negative()函数最小化损失函数的负值，而不是最大化。我们使用以下代码完成此任务：

model_output = tf.matmul(b, my_kernel) 
first_term = tf.reduce_sum(b) 
b_vec_cross = tf.matmul(tf.transpose(b), b) 
y_target_cross = tf.matmul(y_target, tf.transpose(y_target)) 
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross, y_target_cross))) 
loss = tf.negative(tf.subtract(first_term, second_term))

我们现在创建预测和准确率函数。首先，我们必须创建一个预测核，类似于步骤 4，但是我们拥有带有预测数据的点的核心，而不是点的核。然后预测是模型输出的符号。这实现如下：

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1]) 
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1]) 
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB)) 
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist))) 
prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b), pred_kernel) 
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output)) 
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction), tf.squeeze(y_target)), tf.float32))

为了实现线性预测核，我们可以编写pred_kernel = tf.matmul(x_data, tf.transpose(prediction_grid))。

现在，我们可以创建一个优化函数并初始化所有变量，如下所示：

my_opt = tf.train.GradientDescentOptimizer(0.001) 
train_step = my_opt.minimize(loss) 
init = tf.global_variables_initializer() 
sess.run(init)

接下来，我们开始训练循环。我们将记录每代的损耗向量和批次精度。当我们运行准确率时，我们必须放入所有三个占位符，但我们输入x数据两次以获得对点的预测，如下所示：

loss_vec = [] 
batch_accuracy = [] 
for i in range(500): 
    rand_index = np.random.choice(len(x_vals), size=batch_size) 
    rand_x = x_vals[rand_index] 
    rand_y = np.transpose([y_vals[rand_index]]) 
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y}) 
    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y}) 
    loss_vec.append(temp_loss) 
    acc_temp = sess.run(accuracy, feed_dict={x_data: rand_x, 
                                             y_target: rand_y, 
                                             prediction_grid:rand_x}) 
    batch_accuracy.append(acc_temp) 
    if (i+1)%100==0: 
        print('Step #' + str(i+1)) 
        print('Loss = ' + str(temp_loss))

这产生以下输出：

Step #100 
Loss = -28.0772 
Step #200 
Loss = -3.3628 
Step #300 
Loss = -58.862 
Step #400 
Loss = -75.1121 
Step #500 
Loss = -84.8905

为了查看整个空间的输出类，我们将在系统中创建一个预测点网格，并对所有这些预测点进行预测，如下所示：

x_min, x_max = x_vals[:, 0].min() - 1, x_vals[:, 0].max() + 1 
y_min, y_max = x_vals[:, 1].min() - 1, x_vals[:, 1].max() + 1 
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), 
                     np.arange(y_min, y_max, 0.02)) 
grid_points = np.c_[xx.ravel(), yy.ravel()] 
[grid_predictions] = sess.run(prediction, feed_dict={x_data: x_vals, 
                                                   y_target: np.transpose([y_vals]), 
                                                   prediction_grid: grid_points}) 
grid_predictions = grid_predictions.reshape(xx.shape)

以下是绘制结果，批次准确率和损失的代码：

plt.contourf(xx, yy, grid_predictions, cmap=plt.cm.Paired, alpha=0.8) 
plt.plot(class1_x, class1_y, 'ro', label='Class 1') 
plt.plot(class2_x, class2_y, 'kx', label='Class -1') 
plt.legend(loc='lower right') 
plt.ylim([-1.5, 1.5]) 
plt.xlim([-1.5, 1.5]) 
plt.show() 
plt.plot(batch_accuracy, 'k-', label='Accuracy') 
plt.title('Batch Accuracy') 
plt.xlabel('Generation') 
plt.ylabel('Accuracy') 
plt.legend(loc='lower right') 
plt.show() 
plt.plot(loss_vec, 'k-') 
plt.title('Loss per Generation') 
plt.xlabel('Generation') 
plt.ylabel('Loss') 
plt.show()

为了简洁起见，我们将仅显示结果图，但我们也可以单独运行绘图代码并查看损失和准确率。

以下屏幕截图说明了线性可分离拟合对我们的非线性数据有多糟糕：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bSuZeo0Y-1681566813074)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/3dc4f147-d736-4b8d-a4af-e004a6f306fc.png)]

图 7：非线性可分离数据上的线性 SVM

以下屏幕截图显示了高斯核可以更好地拟合非线性数据：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uNeJIL8p-1681566813074)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/942b3cec-7ca0-42d5-bc2b-7fddb0a4d4e4.png)]Figure 8: Non-linear SVM with Gaussian kernel results on non-linear ring data

如果我们使用高斯核来分离我们的非线性环数据，我们会得到更好的拟合。

工作原理

有两个重要的代码需要了解：我们如何实现核，以及我们如何为 SVM 双优化问题实现损失函数。我们已经展示了如何实现线性和高斯核，并且高斯核可以分离非线性数据集。

我们还应该提到另一个参数，即高斯核中的伽马值。此参数控制影响点对分离曲率的影响程度。通常选择小值，但它在很大程度上取决于数据集。理想情况下，使用交叉验证等统计技术选择此参数。

对于新点的预测/评估，我们使用以下命令：sess.run(prediction, feed_dict:{x_data: x_vals, y_data: np.transpose([y_vals])})。此评估必须包括原始数据集（x_vals和y_vals），因为 SVM 是使用支持向量定义的，由哪些点指定在边界上或不是。

如果我们这样选择，我们可以实现更多核。以下是一些更常见的非线性核列表：

多项式齐次核：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-90714whP-1681566813074)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/3233ad1f-7336-40e8-b9de-895eb041bb5b.png)]

多项式非齐次核：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-T1cbNmcz-1681566813074)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/bdb676cb-da55-47f7-a656-2a0406ff4ab3.png)]

双曲正切核：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zGkuLQ51-1681566813075)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/5d9c1c9e-e1ce-4497-9fc4-5f7be14ce1b3.png)]

实现非线性 SVM

对于此秘籍，我们将应用非线性核来拆分数据集。

准备

在本节中，我们将在实际数据上实现前面的高斯核 SVM。我们将加载鸢尾数据集并为山鸢尾创建分类器（与其它鸢尾相比）。我们将看到各种伽马值对分类的影响。

操作步骤

我们按如下方式处理秘籍：

我们首先加载必要的库，其中包括scikit-learn数据集，以便我们可以加载鸢尾数据。然后，我们将启动图会话。使用以下代码：

import matplotlib.pyplot as plt 
import numpy as np 
import tensorflow as tf 
from sklearn import datasets 
sess = tf.Session()

接下来，我们将加载鸢尾数据，提取萼片长度和花瓣宽度，并分离每个类的x和y值（以便以后绘图），如下所示：

iris = datasets.load_iris() 
x_vals = np.array([[x[0], x[3]] for x in iris.data]) 
y_vals = np.array([1 if y==0 else -1 for y in iris.target]) 
class1_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==1] 
class1_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==1] 
class2_x = [x[0] for i,x in enumerate(x_vals) if y_vals[i]==-1] 
class2_y = [x[1] for i,x in enumerate(x_vals) if y_vals[i]==-1]

现在，我们声明我们的批量大小（首选大批量），占位符和模型变量b，如下所示：

batch_size = 100 
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
y_target = tf.placeholder(shape=[None, 1], dtype=tf.float32) 
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
b = tf.Variable(tf.random_normal(shape=[1,batch_size]))

接下来，我们声明我们的高斯核。这个核依赖于伽马值，我们将在本文后面的各个伽玛值对分类的影响进行说明。使用以下代码：

gamma = tf.constant(-10.0) 
dist = tf.reduce_sum(tf.square(x_data), 1) 
dist = tf.reshape(dist, [-1,1]) 
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))), tf.transpose(dist)) 
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists))) 
# We now compute the loss for the dual optimization problem, as follows: 
model_output = tf.matmul(b, my_kernel) 
first_term = tf.reduce_sum(b) 
b_vec_cross = tf.matmul(tf.transpose(b), b) 
y_target_cross = tf.matmul(y_target, tf.transpose(y_target)) 
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross, y_target_cross))) 
loss = tf.negative(tf.subtract(first_term, second_term))

为了使用 SVM 执行预测，我们必须创建预测核函数。之后，我们还会声明一个准确率计算，它只是使用以下代码正确分类的点的百分比：

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1]) 
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1]) 
pred_sq_dist = tf.add(tf.subtract(rA, tf.mul(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB)) 
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist))) 
prediction_output = tf.matmul(tf.multiply(tf.transpose(y_target),b), pred_kernel) 
prediction = tf.sign(prediction_output-tf.reduce_mean(prediction_output)) 
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.squeeze(prediction), tf.squeeze(y_target)), tf.float32))

接下来，我们声明我们的优化函数并初始化变量，如下所示：

my_opt = tf.train.GradientDescentOptimizer(0.01) 
train_step = my_opt.minimize(loss) 
init = tf.initialize_all_variables() 
sess.run(init)

现在，我们可以开始训练循环了。我们运行循环 300 次迭代并存储损失值和批次精度。为此，我们使用以下实现：

loss_vec = [] 
batch_accuracy = [] 
for i in range(300): 
    rand_index = np.random.choice(len(x_vals), size=batch_size) 
    rand_x = x_vals[rand_index] 
    rand_y = np.transpose([y_vals[rand_index]]) 
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y}) 
    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y}) 
    loss_vec.append(temp_loss) 
    acc_temp = sess.run(accuracy, feed_dict={x_data: rand_x, 
                                             y_target: rand_y, 
                                             prediction_grid:rand_x}) 
    batch_accuracy.append(acc_temp)

为了绘制决策边界，我们将创建x，y点的网格并评估我们在所有这些点上创建的预测函数，如下所示：

x_min, x_max = x_vals[:, 0].min() - 1, x_vals[:, 0].max() + 1 
y_min, y_max = x_vals[:, 1].min() - 1, x_vals[:, 1].max() + 1 
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), 
                     np.arange(y_min, y_max, 0.02)) 
grid_points = np.c_[xx.ravel(), yy.ravel()] 
[grid_predictions] = sess.run(prediction, feed_dict={x_data: x_vals, 
                                                   y_target: np.transpose([y_vals]), 
                                                   prediction_grid: grid_points}) 
grid_predictions = grid_predictions.reshape(xx.shape)

为简洁起见，我们只展示如何用决策边界绘制点。有关伽马值的图和效果，请参阅本秘籍的下一部分。使用以下代码：

plt.contourf(xx, yy, grid_predictions, cmap=plt.cm.Paired, alpha=0.8) 
plt.plot(class1_x, class1_y, 'ro', label='I. setosa') 
plt.plot(class2_x, class2_y, 'kx', label='Non-setosa') 
plt.title('Gaussian SVM Results on Iris Data') 
plt.xlabel('Petal Length') 
plt.ylabel('Sepal Width') 
plt.legend(loc='lower right') 
plt.ylim([-0.5, 3.0]) 
plt.xlim([3.5, 8.5]) 
plt.show()

工作原理

以下是对四种不同伽玛值（1，10，25 和 100）的山鸢尾结果的分类。注意伽玛值越高，每个单独点对分类边界的影响越大：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Vv0jEvNY-1681566813075)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/21a6bf2b-18fd-407c-9854-bca30ac02a84.png)]

图 9：使用具有四个不同伽马值的高斯核 SVM 的山鸢尾的分类结果

实现多类 SVM

我们还可以使用 SVM 对多个类进行分类，而不仅仅是两个类。在本文中，我们将使用多类 SVM 对鸢尾数据集中的三种类型的花进行分类。

准备

通过设计，SVM 算法是二元分类器。但是，有一些策略可以让他们在多个类上工作。两种主要策略称为“一对一”，“一对剩余”。

一对一是一种策略，其中为每个可能的类对创建二分类器。然后，对具有最多投票的类的点进行预测。这可能在计算上很难，因为我们必须为k类创建k!/(k - 2)!2!个分类器。

实现多类分类器的另一种方法是执行一对一策略，我们为k类的每个类创建一个分类器。点的预测类将是创建最大 SVM 边距的类。这是我们将在本节中实现的策略。

在这里，我们将加载鸢尾数据集并使用高斯核执行多类非线性 SVM。鸢尾数据集是理想的，因为有三个类（山鸢尾，弗吉尼亚和杂色鸢尾）。我们将为每个类创建三个高斯核 SVM，并预测存在最高边界的点。

操作步骤

我们按如下方式处理秘籍：

首先，我们加载我们需要的库并启动图，如下所示：

import matplotlib.pyplot as plt 
import numpy as np 
import tensorflow as tf 
from sklearn import datasets 
sess = tf.Session()

接下来，我们将加载鸢尾数据集并拆分每个类的目标。我们将仅使用萼片长度和花瓣宽度来说明，因为我们希望能够绘制输出。我们还将每个类的x和y值分开，以便最后进行绘图。使用以下代码：

iris = datasets.load_iris() 
x_vals = np.array([[x[0], x[3]] for x in iris.data]) 
y_vals1 = np.array([1 if y==0 else -1 for y in iris.target]) 
y_vals2 = np.array([1 if y==1 else -1 for y in iris.target]) 
y_vals3 = np.array([1 if y==2 else -1 for y in iris.target]) 
y_vals = np.array([y_vals1, y_vals2, y_vals3]) 
class1_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==0] 
class1_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==0] 
class2_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==1] 
class2_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==1] 
class3_x = [x[0] for i,x in enumerate(x_vals) if iris.target[i]==2] 
class3_y = [x[1] for i,x in enumerate(x_vals) if iris.target[i]==2]

与实现非线性 SVM 秘籍相比，我们在此示例中所做的最大改变是，许多维度将发生变化（我们现在有三个分类器而不是一个）。我们还将利用矩阵广播和重塑技术一次计算所有三个 SVM。由于我们一次性完成这一操作，我们的y_target占位符现在具有[3, None]的大小，我们的模型变量b将被初始化为[3, batch_size]。使用以下代码：

batch_size = 50 
x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
y_target = tf.placeholder(shape=[3, None], dtype=tf.float32) 
prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.float32) 
b = tf.Variable(tf.random_normal(shape=[3,batch_size]))

接下来，我们计算高斯核。由于这仅取决于输入的 x 数据，因此该代码不会改变先前的秘籍。使用以下代码：

gamma = tf.constant(-10.0) 
dist = tf.reduce_sum(tf.square(x_data), 1) 
dist = tf.reshape(dist, [-1,1]) 
sq_dists = tf.add(tf.subtract(dist, tf.multiply(2., tf.matmul(x_data, tf.transpose(x_data)))), tf.transpose(dist)) 
my_kernel = tf.exp(tf.multiply(gamma, tf.abs(sq_dists)))

一个重大变化是我们将进行批量矩阵乘法。我们将最终得到三维矩阵，我们将希望在第三个索引上广播矩阵乘法。我们没有为此设置数据和目标矩阵。为了使x^T · x等操作跨越额外维度，我们创建一个函数来扩展这样的矩阵，将矩阵重新整形为转置，然后在额外维度上调用 TensorFlow 的batch_matmul。使用以下代码：

def reshape_matmul(mat): 
    v1 = tf.expand_dims(mat, 1) 
    v2 = tf.reshape(v1, [3, batch_size, 1]) 
    return tf.batch_matmul(v2, v1)

创建此函数后，我们现在可以计算双重损失函数，如下所示：

model_output = tf.matmul(b, my_kernel) 
first_term = tf.reduce_sum(b) 
b_vec_cross = tf.matmul(tf.transpose(b), b) 
y_target_cross = reshape_matmul(y_target) 
second_term = tf.reduce_sum(tf.multiply(my_kernel, tf.multiply(b_vec_cross, y_target_cross)),[1,2]) 
loss = tf.reduce_sum(tf.negative(tf.subtract(first_term, second_term)))

现在，我们可以创建预测核。请注意，我们必须小心reduce_sum函数并且不要在所有三个 SVM 预测中减少，因此我们必须告诉 TensorFlow 不要用第二个索引参数对所有内容求和。使用以下代码：

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1]) 
rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1]) 
pred_sq_dist = tf.add(tf.subtract(rA, tf.multiply(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB)) 
pred_kernel = tf.exp(tf.multiply(gamma, tf.abs(pred_sq_dist)))

当我们完成预测核时，我们可以创建预测。这里的一个重大变化是预测不是输出的sign()。由于我们正在实现一对一策略，因此预测是具有最大输出的分类器。为此，我们使用 TensorFlow 的内置argmax()函数，如下所示：

prediction_output = tf.matmul(tf.mul(y_target,b), pred_kernel) 
prediction = tf.arg_max(prediction_output-tf.expand_dims(tf.reduce_mean(prediction_output,1), 1), 0) 
accuracy = tf.reduce_mean(tf.cast(tf.equal(prediction, tf.argmax(y_target,0)), tf.float32))

现在我们已经拥有了核，损失和预测函数，我们只需要声明我们的优化函数并初始化我们的变量，如下所示：

my_opt = tf.train.GradientDescentOptimizer(0.01) 
train_step = my_opt.minimize(loss) 
init = tf.global_variables_initializer() 
sess.run(init)

该算法收敛速度相对较快，因此我们不必运行训练循环超过 100 次迭代。我们使用以下代码执行此操作：

loss_vec = [] 
batch_accuracy = [] 
for i in range(100): 
    rand_index = np.random.choice(len(x_vals), size=batch_size) 
    rand_x = x_vals[rand_index] 
    rand_y = y_vals[:,rand_index] 
    sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y}) 
    temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y}) 
    loss_vec.append(temp_loss) 
    acc_temp = sess.run(accuracy, feed_dict={x_data: rand_x, y_target: rand_y, prediction_grid:rand_x}) 
    batch_accuracy.append(acc_temp) 
    if (i+1)%25==0: 
        print('Step #' + str(i+1)) 
        print('Loss = ' + str(temp_loss)) 
Step #25 
Loss = -2.8951 
Step #50 
Loss = -27.9612 
Step #75 
Loss = -26.896 
Step #100 
Loss = -30.2325

我们现在可以创建点的预测网格并对所有点运行预测函数，如下所示：

x_min, x_max = x_vals[:, 0].min() - 1, x_vals[:, 0].max() + 1 
y_min, y_max = x_vals[:, 1].min() - 1, x_vals[:, 1].max() + 1 
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), 
                     np.arange(y_min, y_max, 0.02)) 
grid_points = np.c_[xx.ravel(), yy.ravel()] 
grid_predictions = sess.run(prediction, feed_dict={x_data: rand_x, 
                                                   y_target: rand_y, 
                                                   prediction_grid: grid_points}) 
grid_predictions = grid_predictions.reshape(xx.shape)

以下是绘制结果，批量准确率和损失函数的代码。为简洁起见，我们只显示最终结果：

plt.contourf(xx, yy, grid_predictions, cmap=plt.cm.Paired, alpha=0.8) 
plt.plot(class1_x, class1_y, 'ro', label='I. setosa') 
plt.plot(class2_x, class2_y, 'kx', label='I. versicolor') 
plt.plot(class3_x, class3_y, 'gv', label='I. virginica') 
plt.title('Gaussian SVM Results on Iris Data') 
plt.xlabel('Petal Length') 
plt.ylabel('Sepal Width') 
plt.legend(loc='lower right') 
plt.ylim([-0.5, 3.0]) 
plt.xlim([3.5, 8.5])  
plt.show() 
plt.plot(batch_accuracy, 'k-', label='Accuracy') 
plt.title('Batch Accuracy') 
plt.xlabel('Generation') 
plt.ylabel('Accuracy') 
plt.legend(loc='lower right') 
plt.show() 
plt.plot(loss_vec, 'k-') 
plt.title('Loss per Generation') 
plt.xlabel('Generation') 
plt.ylabel('Loss') 
plt.show()

然后我们得到以下绘图：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uBuuoduZ-1681566813075)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/42907158-e660-4a92-a2da-897722139ec5.png)]

图 10：在鸢尾数据集上的伽马为 10 的多类（三类）非线性高斯 SVM 的结果

我们观察前面的屏幕截图，其中显示了所有三个鸢尾类，以及为每个类分类的点网格。

工作原理

本文中需要注意的重点是我们如何改变算法以同时优化三个 SVM 模型。我们的模型参数b有一个额外的维度可以考虑所有三个模型。在这里，我们可以看到，由于 TensorFlow 处理额外维度的内置函数，算法扩展到多个类似算法相对容易。

下一章将介绍最近邻方法，这是一种用于预测目的的非常强大的算法。

五、最近邻方法

本章将重点介绍最近邻方法，以及如何在 TensorFlow 中实现它们。我们将首先介绍这些方法，然后我们将说明如何实现各种形式。本章将以地址匹配和图像识别的示例结束。

在本章中，我们将介绍以下内容：

使用最近邻
使用基于文本的距离
计算混合距离函数
使用地址匹配的示例
使用最近邻进行图像识别

请注意，所有最新代码均可在 Github 和 Packt 仓库获得。

介绍

最近邻方法植根于基于距离的概念思想。我们认为我们的训练设定了一个模型，并根据它们与训练集中的点的接近程度对新点进行预测。一种简单的方法是使预测类与最接近的训练数据点类相同。但由于大多数数据集包含一定程度的噪声，因此更常见的方法是采用一组k-最近邻的加权平均值。该方法称为 K 最近邻（KNN）。

给定具有相应目标（y[1], y[2]....y[n]）的训练数据集（x[1],x[2].....x[n]），我们可以通过查看一组最近邻来对点z进行预测。实际的预测方法取决于我们是进行回归（连续y[i]）还是分类（离散y[i]）。

对于离散分类目标，可以通过最大投票方案给出预测，通过到预测点的距离加权：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9Vm01fEg-1681566813075)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/f6815331-7ba6-4be4-9719-eeeca8f4dd94.png)]

我们这里的预测f(z)是所有类别j的最大加权值，其中从预测点到训练点的加权距离i由φ(d[ij])给出。如果点i在类j.中，l[ij]只是一个指示器函数如果点i在类j中，则指示器函数取值 1，如果不是，则取值 0 另外，k是要考虑的最近点数。

对于连续回归目标，预测由最接近预测的所有k点的加权平均值给出：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-h0tiKWAJ-1681566813076)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/7bd24242-3c82-47ee-b114-7aeb5922e317.png)]

很明显，预测很大程度上取决于距离度量的选择d。

距离度量的常用规范是 L1 和 L2 距离，如下所示：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-H5j92IqL-1681566813076)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/1506c3f9-8094-4485-8d10-dd4dcf414fbe.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GXHYfQ1H-1681566813076)(https://gitcode.net/apachecn/apachecn-dl-zh/-/raw/master/docs/tf-ml-cookbook-2e-zh/img/b5116b6c-0f94-40b6-86b6-58611da16ca4.png)]

我们可以选择许多不同规格的距离指标。在本章中，我们将探讨 L1 和 L2 指标，以及编辑和文本距离。

我们还必须选择如何加权距离。对距离进行加权的直接方法是距离本身。远离我们预测的点应该比较近点的影响小。最常见的权重方法是通过距离的归一化逆。我们将在下一个秘籍中实现此方法。

注意，KNN 是一种聚合方法。对于回归，我们正在执行邻居的加权平均。因此，预测将不那么极端，并且与实际目标相比变化较小。这种影响的大小将由算法中邻居的数量k决定。

使用最近邻

我们将通过实现最近邻来预测住房价值来开始本章。这是从最近邻开始的好方法，因为我们将处理数字特征和连续目标。

准备

为了说明如何在 TensorFlow 中使用最近邻进行预测，我们将使用波士顿住房数据集。在这里，我们将预测邻域住房价值中位数作为几个特征的函数。

由于我们考虑训练集训练模型，我们将找到预测点的 KNN，并将计算目标值的加权平均值。

操作步骤

我们按如下方式处理秘籍：

我们将从加载所需的库并启动图会话开始。我们将使用requests模块从 UCI 机器学习库加载必要的波士顿住房数据：

import matplotlib.pyplot as plt 
import numpy as np 
import tensorflow as tf 
import requests 
sess = tf.Session()

接下来，我们将使用requests模块加载数据：

housing_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data' 
housing_header = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV'] 
cols_used = ['CRIM', 'INDUS', 'NOX', 'RM', 'AGE', 'DIS', 'TAX', 'PTRATIO', 'B', 'LSTAT'] 
num_features = len(cols_used) 
# Request data 
housing_file = requests.get(housing_url) 
# Parse Data 
housing_data = [[float(x) for x in y.split(' ') if len(x)>=1] for y in housing_file.text.split('n') if len(y)>=1]

接下来，我们将数据分为依赖和独立的特征。我们将预测最后一个变量MEDV，这是房屋组的中值。我们也不会使用ZN，CHAS和RAD特征，因为它们没有信息或二元性质：

y_vals = np.transpose([np.array([y[13] for y in housing_data])]) 
x_vals = np.array([[x for i,x in enumerate(y) if housing_header[i] in cols_used] for y in housing_data]) 
x_vals = (x_vals - x_vals.min(0)) / x_vals.ptp(0)

现在，我们将x和y值分成训练和测试集。我们将通过随机选择大约 80% 的行来创建训练集，并将剩下的 20% 留给测试集：

train_indices = np.random.choice(len(x_vals), round(len(x_vals)*0.8), replace=False) 
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices))) 
x_vals_train = x_vals[train_indices] 
x_vals_test = x_vals[test_indices] 
y_vals_train = y_vals[train_indices] 
y_vals_test = y_vals[test_indices]

接下来，我们将声明k值和批量大小：

k = 4 
batch_size=len(x_vals_test)

我们接下来会申报占位符。请记住，没有模型变量需要训练，因为模型完全由我们的训练集确定：

x_data_train = tf.placeholder(shape=[None, num_features], dtype=tf.float32)
x_data_test = tf.placeholder(shape=[None, num_features], dtype=tf.float32)
y_target_train = tf.placeholder(shape=[None, 1], dtype=tf.float32)
y_target_test = tf.placeholder(shape=[None, 1], dtype=tf.float32)

接下来，我们将为一批测试点创建距离函数。在这里，我们将说明 L1 距离的使用：

distance = tf.reduce_sum(tf.abs(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), reduction_indices=2)

注意，也可以使用 L2 距离函数。我们将距离公式改为distance = tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(x_data_train, tf.expand_dims(x_data_test,1))), reduction_indices=1))。

现在，我们将创建我们的预测函数。为此，我们将使用top_k()函数，该函数返回张量中最大值的值和索引。由于我们想要最小距离的指数，我们将找到k - 最大负距离。我们还将声明目标值的预测和均方误差（MSE）：

TensorFlow 机器学习秘籍第二版：1~5（5）https://developer.aliyun.com/article/1426835

TensorFlow 机器学习秘籍第二版：1~5（4）

准备

操作步骤

工作原理

在 TensorFlow 中使用核

准备

操作步骤

工作原理

更多

实现非线性 SVM

准备

操作步骤

工作原理

实现多类 SVM

准备

操作步骤

工作原理

五、最近邻方法

介绍

使用最近邻

准备

操作步骤

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

TensorFlow 机器学习秘籍第二版：1~5（4）

准备

操作步骤

工作原理

在 TensorFlow 中使用核

准备

操作步骤

工作原理

更多

实现非线性 SVM

准备

操作步骤

工作原理

实现多类 SVM

准备

操作步骤

工作原理

五、最近邻方法

介绍

使用最近邻

准备

操作步骤

热门文章

最新文章

相关课程

相关电子书

相关实验场景