【Python机器学习】实验03 逻辑回归1-阿里云开发者社区

【Python机器学习】实验03 逻辑回归1

2023-10-12 109

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【Python机器学习】实验03 逻辑回归1

简单分类模型 - 逻辑回归

在这一次练习中，我们将要实现逻辑回归并且应用到一个分类任务。我们还将通过将正则化加入训练算法，来提高算法的鲁棒性，并用更复杂的情形来测试它。

1.1 准备数据

本实验的数据包含两个变量(评分1和评分2，可以看作是特征),某大学的管理者，想通过申请学生两次测试的评分，来决定他们是否被录取。因此，构建一个可以基于两次测试评分来评估录取可能性的分类模型。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#利用pandas显示数据
path = 'ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam1', 'Exam2', 'Admitted'])
data.head()

	Exam1	Exam2	Admitted
0	34.623660	78.024693	0
1	30.286711	43.894998	0
2	35.847409	72.902198	0
3	60.182599	86.308552	1
4	79.032736	75.344376	1

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Exam1     100 non-null    float64
 1   Exam2     100 non-null    float64
 2   Admitted  100 non-null    int64  
dtypes: float64(2), int64(1)
memory usage: 2.5 KB

#看看数据的形状
data.shape

(100, 3)

让我们创建两个分数的散点图，并使用颜色编码来可视化，如果样本是正的（被接纳）或负的（未被接纳）。

positive_index=data["Admitted"].isin([1])
negative_index=data["Admitted"].isin([0])

positive_index

0     False
1     False
2     False
3      True
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Name: Admitted, Length: 100, dtype: bool

plt.scatter(data[positive_index]["Exam1"],data[positive_index]["Exam2"],color="red",marker="+")
plt.scatter(data[negative_index]["Exam1"],data[negative_index]["Exam2"],color="blue",marker="o")
plt.legend(["admitted","Not admitted"])
plt.xlabel("Exam1")
plt.ylabel("Exam2")
plt.show()

positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(6,4))
ax.scatter(positive['Exam1'],
           positive['Exam2'],
           s=50,
           c='b',
           marker='o',
           label='Admitted')
ax.scatter(negative['Exam1'],
           negative['Exam2'],
           s=50,
           c='r',
           marker='x',
           label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

看起来在两类间，有一个清晰的决策边界。现在我们需要实现逻辑回归，那样就可以训练一个模型来预测结果。

#准备训练数据
col_num=data.shape[1]
X=data.iloc[:,:col_num-1]
y=data.iloc[:,col_num-1]

X.insert(0,"ones",1)
X.shape

(100, 3)

X=X.values
X.shape

(100, 3)

y=y.values
y.shape

(100,)

1.2 定义假设函数

Sigmoid 函数

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

让我们做一个快速的检查，来确保它可以工作。

nums = np.arange(-10, 10, step=1)
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(nums, sigmoid(nums), 'r')
plt.show()

w=np.zeros((X.shape[1],1))

#定义假设函数h(x)=1/(1+exp^(-w.Tx))
def h(X,w):
    z=X@w
    h=sigmoid(z)
    return h

1.3 定义代价函数

y_hat=sigmoid(X@w)

X.shape,y.shape,np.log(y_hat).shape

((100, 3), (100,), (100, 1))

现在，我们需要编写代价函数来评估结果。

代价函数：

#代价函数构造
def cost(X,w,y):
    #当X(m,n+1),y(m,),w(n+1,1)
    y_hat=sigmoid(X@w)
    right=np.multiply(y.ravel(),np.log(y_hat).ravel())+np.multiply((1-y).ravel(),np.log(1-y_hat).ravel())
    cost=-np.sum(right)/X.shape[0]
    return cost

#设置初始的权值
w=np.zeros((X.shape[1],1))
#查看初始的代价
cost(X,w,y)

0.6931471805599453

看起来不错，接下来，我们需要一个函数来计算我们的训练数据、标签和一些参数w的梯度。

1.4 定义梯度下降算法

gradient descent(梯度下降)

def grandient(X,y,iter_num,alpha):
    y=y.reshape((X.shape[0],1))
    w=np.zeros((X.shape[1],1))
    cost_lst=[]
    for i in range(iter_num):
        y_pred=h(X,w)-y
        temp=np.zeros((X.shape[1],1))
        for j in range(X.shape[1]):
            right=np.multiply(y_pred.ravel(),X[:,j])
            gradient=1/(X.shape[0])*(np.sum(right))
            temp[j,0]=w[j,0]-alpha*gradient
        w=temp
        cost_lst.append(cost(X,w,y.ravel()))
    return w,cost_lst

iter_num,alpha=1000000,0.001
w,cost_lst=grandient(X,y,iter_num,alpha)

cost_lst[iter_num-1]

0.22465416189188264

plt.plot(range(iter_num),cost_lst,"b-o")

[<matplotlib.lines.Line2D at 0x14224c08190>]

Xw—X(m,n) w (n,1)

array([[-15.39517866],
       [  0.12825989],
       [  0.12247929]])

1.5 绘制决策边界

#绘图
x_exma1=np.linspace(data["Exam1"].min(),data["Exam1"].max(),100)
x2=(-w[0,0]-w[1,0]*x_exma1)/(w[2,0])
plt.plot(x_exma1,x2,"r-")
plt.scatter(data[positive_index]["Exam1"],data[positive_index]["Exam2"],color="c",marker="^")
plt.scatter(data[negative_index]["Exam1"],data[negative_index]["Exam2"],color="b",marker="o")
plt.show()

1.6 计算准确率

如何用我们所学的参数w来为数据集X输出预测，来给我们的分类器的训练精度打分。

逻辑回归模型的假设函数：

y_p_true=(h(X,w)>0.5).ravel()
y_p_true

array([False, False, False,  True,  True, False,  True, False,  True,
        True,  True, False,  True,  True, False,  True, False, False,
        True,  True, False,  True, False, False,  True,  True,  True,
        True, False, False,  True,  True, False, False, False, False,
        True,  True, False, False,  True, False,  True,  True, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False,  True,  True,  True,  True,  True, False, False,
       False, False, False,  True, False,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True, False,  True,  True,
        True,  True, False,  True,  True, False,  True,  True, False,
        True,  True, False,  True,  True,  True,  True,  True, False,
        True])

y_p_pred=(data["Admitted"]==1).values
y_p_pred

array([False, False, False,  True,  True, False,  True,  True,  True,
        True, False, False,  True,  True, False,  True,  True, False,
        True,  True, False,  True, False, False,  True,  True,  True,
       False, False, False,  True,  True, False,  True, False, False,
       False,  True, False, False,  True, False,  True, False, False,
       False,  True,  True,  True,  True,  True,  True,  True, False,
       False, False,  True, False,  True,  True,  True, False, False,
       False, False, False,  True, False,  True,  True, False,  True,
        True,  True,  True,  True,  True,  True, False, False,  True,
        True,  True,  True,  True,  True, False,  True,  True, False,
        True,  True, False,  True,  True,  True,  True,  True,  True,
        True])

np.sum(y_p_pred==y_p_true)/X.shape[0]

0.89

【Python机器学习】实验03 逻辑回归1

简单分类模型 - 逻辑回归

1.1 准备数据

1.2 定义假设函数

Sigmoid 函数

1.3 定义代价函数

1.4 定义梯度下降算法

gradient descent(梯度下降)

1.5 绘制决策边界

1.6 计算准确率

热门文章

最新文章

相关课程

相关电子书

相关实验场景