【Python机器学习】实验03 逻辑回归1-阿里云开发者社区

【Python机器学习】实验03 逻辑回归1

2023-10-12 154

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【Python机器学习】实验03 逻辑回归1

简单分类模型 - 逻辑回归

在这一次练习中，我们将要实现逻辑回归并且应用到一个分类任务。我们还将通过将正则化加入训练算法，来提高算法的鲁棒性，并用更复杂的情形来测试它。

1.1 准备数据

本实验的数据包含两个变量(评分1和评分2，可以看作是特征),某大学的管理者，想通过申请学生两次测试的评分，来决定他们是否被录取。因此，构建一个可以基于两次测试评分来评估录取可能性的分类模型。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#利用pandas显示数据
path = 'ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam1', 'Exam2', 'Admitted'])
data.head()

	Exam1	Exam2	Admitted
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Exam1     100 non-null    float64
 1   Exam2     100 non-null    float64
 2   Admitted  100 non-null    int64  
dtypes: float64(2), int64(1)
memory usage: 2.5 KB

#看看数据的形状
data.shape

(100, 3)

让我们创建两个分数的散点图，并使用颜色编码来可视化，如果样本是正的（被接纳）或负的（未被接纳）。

positive_index=data["Admitted"].isin([1])
negative_index=data["Admitted"].isin([0])

positive_index

0     False
1     False
2     False
3      True
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Name: Admitted, Length: 100, dtype: bool

plt.scatter(data[positive_index]["Exam1"],data[positive_index]["Exam2"],color="red",marker="+")
plt.scatter(data[negative_index]["Exam1"],data[negative_index]["Exam2"],color="blue",marker="o")
plt.legend(["admitted","Not admitted"])
plt.xlabel("Exam1")
plt.ylabel("Exam2")
plt.show()

positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(6,4))
ax.scatter(positive['Exam1'],
           positive['Exam2'],
           s=50,
           c='b',
           marker='o',
           label='Admitted')
ax.scatter(negative['Exam1'],
           negative['Exam2'],
           s=50,
           c='r',
           marker='x',
           label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

看起来在两类间，有一个清晰的决策边界。现在我们需要实现逻辑回归，那样就可以训练一个模型来预测结果。

#准备训练数据
col_num=data.shape[1]
X=data.iloc[:,:col_num-1]
y=data.iloc[:,col_num-1]

X.insert(0,"ones",1)
X.shape

(100, 3)

X=X.values
X.shape

(100, 3)

y=y.values
y.shape

(100,)

1.2 定义假设函数

Sigmoid 函数

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

让我们做一个快速的检查，来确保它可以工作。

nums = np.arange(-10, 10, step=1)
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(nums, sigmoid(nums), 'r')
plt.show()

w=np.zeros((X.shape[1],1))

#定义假设函数h(x)=1/(1+exp^(-w.Tx))
def h(X,w):
    z=X@w
    h=sigmoid(z)
    return h

1.3 定义代价函数

y_hat=sigmoid(X@w)

X.shape,y.shape,np.log(y_hat).shape

((100, 3), (100,), (100, 1))

现在，我们需要编写代价函数来评估结果。

代价函数：

#代价函数构造
def cost(X,w,y):
    #当X(m,n+1),y(m,),w(n+1,1)
    y_hat=sigmoid(X@w)
    right=np.multiply(y.ravel(),np.log(y_hat).ravel())+np.multiply((1-y).ravel(),np.log(1-y_hat).ravel())
    cost=-np.sum(right)/X.shape[0]
    return cost

#设置初始的权值
w=np.zeros((X.shape[1],1))
#查看初始的代价
cost(X,w,y)

0.6931471805599453

看起来不错，接下来，我们需要一个函数来计算我们的训练数据、标签和一些参数w的梯度。

1.4 定义梯度下降算法

gradient descent(梯度下降)

def grandient(X,y,iter_num,alpha):
    y=y.reshape((X.shape[0],1))
    w=np.zeros((X.shape[1],1))
    cost_lst=[]
    for i in range(iter_num):
        y_pred=h(X,w)-y
        temp=np.zeros((X.shape[1],1))
        for j in range(X.shape[1]):
            right=np.multiply(y_pred.ravel(),X[:,j])
            gradient=1/(X.shape[0])*(np.sum(right))
            temp[j,0]=w[j,0]-alpha*gradient
        w=temp
        cost_lst.append(cost(X,w,y.ravel()))
    return w,cost_lst

iter_num,alpha=1000000,0.001
w,cost_lst=grandient(X,y,iter_num,alpha)

cost_lst[iter_num-1]

0.22465416189188264

plt.plot(range(iter_num),cost_lst,"b-o")

[<matplotlib.lines.Line2D at 0x14224c08190>]

Xw—X(m,n) w (n,1)

array([[-15.39517866],
       [  0.12825989],
       [  0.12247929]])

1.5 绘制决策边界

#绘图
x_exma1=np.linspace(data["Exam1"].min(),data["Exam1"].max(),100)
x2=(-w[0,0]-w[1,0]*x_exma1)/(w[2,0])
plt.plot(x_exma1,x2,"r-")
plt.scatter(data[positive_index]["Exam1"],data[positive_index]["Exam2"],color="c",marker="^")
plt.scatter(data[negative_index]["Exam1"],data[negative_index]["Exam2"],color="b",marker="o")
plt.show()

1.6 计算准确率

如何用我们所学的参数w来为数据集X输出预测，来给我们的分类器的训练精度打分。

逻辑回归模型的假设函数：

y_p_true=(h(X,w)>0.5).ravel()
y_p_true

array([False, False, False,  True,  True, False,  True, False,  True,
        True,  True, False,  True,  True, False,  True, False, False,
        True,  True, False,  True, False, False,  True,  True,  True,
        True, False, False,  True,  True, False, False, False, False,
        True,  True, False, False,  True, False,  True,  True, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False,  True,  True,  True,  True,  True, False, False,
       False, False, False,  True, False,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True, False,  True,  True,
        True,  True, False,  True,  True, False,  True,  True, False,
        True,  True, False,  True,  True,  True,  True,  True, False,
        True])

y_p_pred=(data["Admitted"]==1).values
y_p_pred

array([False, False, False,  True,  True, False,  True,  True,  True,
        True, False, False,  True,  True, False,  True,  True, False,
        True,  True, False,  True, False, False,  True,  True,  True,
       False, False, False,  True,  True, False,  True, False, False,
       False,  True, False, False,  True, False,  True, False, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False,  True, False,  True,  True,  True, False, False,
       False, False, False,  True, False,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True, False, False,  True,
        True,  True,  True,  True,  True, False,  True,  True, False,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True])

np.sum(y_p_pred==y_p_true)/X.shape[0]

0.89

【Python机器学习】实验03 逻辑回归1

简单分类模型 - 逻辑回归

1.1 准备数据

1.2 定义假设函数

Sigmoid 函数

1.3 定义代价函数

1.4 定义梯度下降算法

gradient descent(梯度下降)

1.5 绘制决策边界

1.6 计算准确率

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【Python机器学习】实验03 逻辑回归1

简单分类模型 - 逻辑回归

1.1 准备数据

1.2 定义假设函数

Sigmoid 函数

1.3 定义代价函数

1.4 定义梯度下降算法

gradient descent(梯度下降)

1.5 绘制决策边界

1.6 计算准确率

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像