Introduction
In this exercise, you will implement logistic regression and apply it to two different datasets. Before starting on the programming exercise, we strongly recommend watching the video lectures and completing the review questions for the associated topics.
1 Logistic regression
In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.
Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision.
Your task is to build a classication model that estimates an applicant’s probability of admission based the scores from those two exams.
简单来说,在这个练习中,我们需要建立一个逻辑回归模型去预测一个学生是否能被大学录取。现在假设你是一所大学的管理者,并且你可以根据每个申请人的两门成绩去决定他们是否被录取。你还有以前申请人的历史数据,你可以将其作为逻辑回归模型的训练集,对于每个例子,您都有申请人在两项考试中的分数和录取结果。
现在你的任务就是建立一个基于两项考试的分数的分类模型去评估申请人被录取的可能性。
1.1 Visualizing the data
建议无论打算用什么算法,如果可能的话,都最好将数据可视化,有时候数据可视化以后,你能更加清晰用什么模型更加好
首先导入包Package
import numpy as np import pandas as pd import matplotlib.pyplot as plt
再读入数据
path = 'data/ex2data1.txt' data = pd.read_csv(path,header=None,names=['Exam 1','Exam 2','Admitted']) # 给data设置title data.head()
接着就画出散点图,⚪ 表示的是Admitted,X 表示的是Not Admitted,一个是正一个是负
positive = data[data['Admitted'].isin([1])] # 1 negative = data[data['Admitted'].isin([0])] # 0 fig,ax = plt.subplots(figsize=(12,8)) ax.scatter(x = positive['Exam 1'],y = positive['Exam 2'],s = 50,color = 'b',marker = 'o',label = 'Admitted') ax.scatter(x = negative['Exam 1'],y = negative['Exam 2'],s = 50,color = 'r',marker = 'x',label = 'Not Addmitted') plt.legend() # 显示label ax.set_xlabel('Exam 1 Score') # set x_label ax.set_ylabel('Exam 2 Score') # set y_label plt.show()
可以粗略的看出来,在这两者之间,可能存在一个决策边界进行分来,接着就可以建立逻辑回归模型解决这个分类模型
1.2 Implementation
1.2.1 Warmup exercise: sigmoid function
Before you start with the actual cost function, recall that the logistic regression hypothesis is defined as:
where function g is the sigmoid function. The sigmoid function is defined as:
]]
结合起来,获得逻辑回归的假设:]
]
以上我们回顾了逻辑回归模型里面的sigmod函数,接下来就开始定义他
def sigmoid(z): return 1 / (1 + np.exp(-z))
为了让我们更加对sigmod函数有一个更加清晰的认识,可视化一部分
x1 = np.arange(-10, 10, 0.1) plt.plot(x1, sigmoid(x1), c='r') plt.show()
1.2.2 Cost function
Now you will implement the cost function and gradient for logistic regression.
logistic regression中的cost function与线性回归不同,因为这是一个凸函数 convex function
def cost(theta, X, y): first = (-y) * np.log(sigmoid(X @ theta.T)) second = (1-y) * np.log(1 - sigmoid(X @ theta.T)) return np.mean(first - second)
如果为了得到cost的值,我们还需要对原有的训练集data进行一些操作
# add a ones column - this makes the matrix multiplication work out easier if 'Ones' not in data.columns: data.insert(0,'Ones',1) # set X (training data) and y (target variable) X = data.iloc[:, :-1] # Convert the frame to its Numpy-array representation. y = data.iloc[:,-1] # Return is NOT a Numpy-matrix, rather, a Numpy-array. theta = np.zeros(X.shape[1]) X = np.array(X.values) y = np.array(y.values)
让我们最好检验一下矩阵的维度
X.shape, theta.shape, y.shape # ((100, 3), (3,), (100,))
一切良好
最好就可以计算初始数据的cost值了
cost(theta, X, y) # 0.6931471805599453
代价大约是0.6931471805599453
1.2.3 Gradient
the gradient of the cost is a vector of the same length as θ where the jth element (for j = 0; 1; : : : ; n) is defined as follows:
与线性回归的相比,公式没有很大的区别,只是函数改为了sigmod函数
def gradient(theta, X, y): return (X.T @ (sigmoid(X @ theta.T) - y))/len(X) # the gradient of the cost is a vector of the same length as θ where the jth element (for j = 0, 1, . . . , n)
gradient(theta, X, y) # array([ -0.1 , -12.00921659, -11.26284221])
1.2.4 Learning θ parameters
现在要试图找出让 J ( θ ) J(\theta)J(θ)取得最小值的参数θ \thetaθ。
反复更新每个参数,用这个式子减去学习率 α 乘以后面的微分项。求导后得到:
计算得到等式:
来它同时更新所有θ \thetaθ的值。
这个更新规则和之前用来做线性回归梯度下降的式子是一样的, 但是假设的定义发生了变化。即使更新参数的规则看起来基本相同,但由于假设的定义发生了变化,所以逻辑函数的梯度下降,跟线性回归的梯度下降实际上是两个完全不同的东西。
可是如果用代码实现怎么办呢,在exp2.pdf中,一个称为“fminunc”的Octave函数是用来优化函数来计算成本和梯度参数。由于我们使用Python,我们可以用SciPy的“optimize”命名空间来做同样的事情。
这里我们使用的是高级优化算法,运行速度通常远远超过梯度下降。方便快捷。
只需传入cost函数,已经所求的变量theta,和梯度。cost函数定义变量时变量tehta要放在第一个,若cost函数只返回cost,则设置fprime=gradient。
这里使用fimin_tnc或者minimize方法来拟合,minimize中method可以选择不同的算法来计算,其中包括TNC
import scipy.optimize as opt result = opt.fmin_tnc(func=cost, x0=theta, fprime=gradient, args=(X, y)) result # (array([-25.16131878, 0.20623159, 0.20147149]), 36, 0)
下面是第二种方法,结果是一样的
res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='TNC', jac=gradient) res # help(opt.minimize) # res.x # final_theta
cost(result[0], X, y) # 0.20349770158947394
1.2.5 Evaluating logistic regression
After learning the parameters, you can use the model to predict whether a particular student will be admitted. For a student with an Exam 1 score of 45 and an Exam 2 score of 85, you should expect to see an admission probability of 0.776.
我们现在已经学号了参数,我们需要利用这个模型去预测是否能被录取,比如一个Exam1得分为45,而Exam2得分为85的,他被录取的可能性大约是0.776
我们可以测试45分和85分的
# 实现hθ def hfunc1(theta, X): return sigmoid(np.dot(theta.T, X)) hfunc1(result[0],[1,45,85])
0.7762906256930321
看来大约是77.6%没错,nice
我们定义:
当h θ {{h}_{\theta }}h
θ
大于等于0.5时,预测 y=1
当h θ {{h}_{\theta }}h
θ
小于0.5时,预测 y=0 。
# 定义预测函数 def predict(theta, X): probability = sigmoid(X * theta.T) return [1 if x >= 0.5 else 0 for x in probability]
theta_min = np.matrix(result[0]) predictions = predict(theta_min, X) correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)] accuracy = (sum(map(int, correct)) % len(correct)) print ('accuracy = {0}%'.format(accuracy))
accuracy = 89%
在整个数据集上进行测试,发现我们的accuracy大约达到了89%,还是挺不错的
当然,也可以利用sklearn库来得到准确率
from sklearn.metrics import classification_report print(classification_report(predictions, y))