【阿旭机器学习实战】【23】特征降维实战---人脸识别降维建模，并选出最有模型进行未知图片预测

2022-12-08 164 发布于吉林

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

模型在线服务 PAI-EAS，A10/V100等 500元 1个月

模型训练 PAI-DLC，100CU*H 3个月

交互式建模 PAI-DSW，每月250计算时 3个月

简介： 【阿旭机器学习实战】【23】特征降维实战---人脸识别降维建模，并选出最有模型进行未知图片预测

PCA特征降维实战—人脸识别

问题描述–人脸识别

通过训练一批人的人脸数据，然后从其他地方获取一种图片让模型认识这个图片代表的谁？

判断人脸需要用监督学习，人脸的维度过高，监督学习判断的时候就会出现两个问题：算法效率会非常低和算方法的精准度也会降低。

我们在进行监督学习之前要进行特征降维，然后使用降维后的特征进行建模，以提高算法效率与准确度。

1. 导入数据并查看数据

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn import datasets

# 导入人脸数据
faces = datasets.fetch_lfw_people(min_faces_per_person=70,
                                  slice_=(slice(0,250,None),slice(0,250,None)),resize=1)
# 这个函数首先会去本地缓存地址中去加载数据，如果没有缓存，去datasets中加载并缓存到本地

dir(faces)#查看数据的属性
• 1

['DESCR', 'data', 'images', 'target', 'target_names']

# 各个属性的维度
print(faces.data.shape)
print(faces.images.shape)
print(faces.target.shape)
print(faces.target_names.shape)

(1288, 62500)
(1288, 250, 250)
(1288,)
(7,)

从上面数据可以看出有1288条数据，图片大小为250 * 250，像素点个数为 250 * 250 = 62500，也就是特征有62500个，一共有7个类型的标签（人）。

data = faces.data
target = faces.target
target_names = faces.target_names
imgs = faces.images

# 查看7个人的名字
target_names

array(['Ariel Sharon', 'Colin Powell', 'Donald Rumsfeld', 'George W Bush',
       'Gerhard Schroeder', 'Hugo Chavez', 'Tony Blair'], dtype='<U17')

plt.imshow(imgs[100],cmap="gray")
• 1

<matplotlib.image.AxesImage at 0x1fbb98e0c18>

2. 划分训练数据和测试数据

from sklearn.model_selection import train_test_split,cross_val_score

x_train,x_test,y_train,y_test = train_test_split(data,target,test_size=0.05)

3. PCA降维

from sklearn.decomposition import PCA

# 提取300个特征数据
pca = PCA(n_components=300)

# 训练
pca.fit(x_train)

PCA(copy=True, iterated_power='auto', n_components=300, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

# 将训练特征转换到PCA特征空间
x_train_pca = pca.transform(x_train)

x_train.shape,x_train_pca.shape
• 1

((1223, 62500), (1223, 300))
• 1

我们可以看到，特征数目从62500直接减少为了300。直接减少了好几个量级。

x_test.shape
• 1

(65, 62500)

# 对测试特征进行降维
x_test_pca = pca.transform(x_test)# 【注意】这里不需要再训练，直接转化

4. 用监督学习相关算法，来降维后的特征进行训练并预测

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB

4.1对各个分类模型算法进行交叉验证，寻求最好的那个模型

# KNN模型
knn = KNeighborsClassifier()

cross_val_score(knn,x_train_pca,y_train)
• 1

array([0.46585366, 0.47911548, 0.44581281])

# SGD模型
sgd = SGDClassifier()

cross_val_score(sgd,x_train_pca,y_train)
• 1

array([0.53170732, 0.51351351, 0.49014778])

# 决策树模型
dt = DecisionTreeClassifier()
• 1
• 2

cross_val_score(dt,x_train_pca,y_train)
• 1

array([0.27560976, 0.28501229, 0.29802956])

# 朴素贝叶斯模型
g_NB = GaussianNB()
• 1
• 2

cross_val_score(g_NB,x_train_pca,y_train)
• 1

array([0.50731707, 0.47911548, 0.46305419])

# SVC模型
svc = SVC(kernel="linear")
• 1
• 2

cross_val_score(svc,x_train_pca,y_train)
• 1

array([0.77073171, 0.77641278, 0.74137931])

经过上面的交叉验证发现，SVC的linear核函数模型的预测效果最好,下面使用SVC进行建模。

svc.fit(x_train_pca,y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

y_ = svc.predict(x_test_pca)
• 1

y_,y_test

(array([6, 6, 3, 3, 2, 1, 1, 6, 3, 3, 1, 3, 3, 3, 1, 4, 3, 4, 3, 6, 0, 1,
        6, 3, 2, 4, 3, 2, 3, 1, 3, 3, 3, 6, 1, 4, 1, 3, 1, 3, 3, 6, 5, 2,
        4, 3, 3, 1, 2, 3, 3, 0, 6, 6, 6, 0, 4, 6, 3, 3, 0, 6, 6, 4, 0],
       dtype=int64),
 array([4, 6, 3, 3, 3, 1, 1, 6, 3, 3, 1, 3, 3, 3, 1, 4, 3, 4, 1, 4, 0, 1,
        6, 3, 2, 5, 3, 3, 3, 1, 3, 3, 3, 6, 1, 4, 1, 3, 1, 3, 3, 6, 5, 0,
        4, 3, 6, 3, 2, 2, 3, 0, 6, 6, 6, 0, 4, 4, 2, 3, 0, 6, 6, 4, 0],
       dtype=int64))

svc.score(x_test_pca,y_test)
• 1

0.8153846153846154
• 1

from sklearn.metrics import classification_report

# 模型结果评估
print(classification_report(y_,y_test))

             precision    recall  f1-score   support
          0       0.83      1.00      0.91         5
          1       0.90      0.90      0.90        10
          2       0.50      0.40      0.44         5
          3       0.87      0.83      0.85        24
          4       0.67      0.86      0.75         7
          5       0.50      1.00      0.67         1
          6       0.91      0.77      0.83        13
avg / total       0.82      0.82      0.81        65

4.2 预测结果展示

画出65个人的人脸图片，并且标注真正是谁，预测的名字是谁

plt.figure(figsize=(5*3,13*4))
for i in range(65):
    axes = plt.subplot(13,5,i+1)
    axes.imshow(x_test[i].reshape((250,250)),cmap="gray")
    axes.axis("off")
    axes.set_title("True:%s\nPre:%s"%(target_names[y_test[i]],target_names[y_[i]]))

4.3 下载照片并使用模型进行预测

从网上下载一个Bush或者其他人的图片，然后处理成符合我们规定灰度级图片，用我们模型来预测该图片是谁

bush = plt.imread("./data/bush.jpg")

bush.shape
• 1

(625, 500, 3)
• 1

plt.imshow(bush)
• 1

把图片灰度化

bush1 = np.dot(bush,[0.299,0.587,0.114])
• 1

plt.imshow(bush1,cmap="gray")

# 对行进行切片，将图片大小变为250 * 250，才可以用上述模型进行预测
bush2 = bush1[50:550][::2,::2]

plt.imshow(bush2)

# 使用PCA对图片进行特征转换
bush_pca = pca.transform(bush2.reshape((1,-1)))
• 1
• 2

name = svc.predict(bush_pca)

target_names[name]
• 1

array(['George W Bush'], dtype='<U17')

模型对照片的预测结果为George W Bush，与实际情况一致。

【阿旭机器学习实战】【23】特征降维实战---人脸识别降维建模，并选出最有模型进行未知图片预测

PCA特征降维实战—人脸识别

问题描述–人脸识别

1. 导入数据并查看数据

2. 划分训练数据和测试数据

3. PCA降维

4. 用监督学习相关算法，来降维后的特征进行训练并预测

4.1对各个分类模型算法进行交叉验证，寻求最好的那个模型

4.2 预测结果展示

4.3 下载照片并使用模型进行预测

ModelScope模型即服务

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

【阿旭机器学习实战】【23】特征降维实战---人脸识别降维建模，并选出最有模型进行未知图片预测

PCA特征降维实战—人脸识别

问题描述–人脸识别

1. 导入数据并查看数据

2. 划分训练数据和测试数据

3. PCA降维

4. 用监督学习相关算法，来降维后的特征进行训练并预测

4.1对各个分类模型算法进行交叉验证，寻求最好的那个模型

4.2 预测结果展示

4.3 下载照片并使用模型进行预测

ModelScope模型即服务

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景