# Python机器学习（二）：线性回归算法

#### 1.简单线性回归

y(i)和x(i)是固定的

"""
Created by 杨帮杰 on 10/1/18
Right to use this code in any way you want without
warranty, support or any guarantee of it working
E-mail: yangbangjie1998@qq.com
Association: SCAU 华南农业大学
"""

import numpy as np

class SimpleLinearRegression:

def __init__(self):
"""初始化Simple Linear Regression 模型"""
self.a_ = None
self.b_ = None

def fit(self, x_train, y_train):
"""根据训练数据集x_train,y_train训练Simple Linear Regression 模型"""
assert x_train.nidm == 1, \
"Simple Linear Regressor can only solve single feature training data."
assert len(x_train) == len(y_train), \
"the size of x_train must be equal to the size of y_train"

x_mean = np.mean(x_train)
y_mean = np.mean(y_train)

"""进行向量化可以加快训练速度"""
# num = 0.0
# d = 0.0
# for x, y in zip(x_train, y_train):
#     num += (x - x_mean) * (y - y_mean)
#     d += (x - x_mean) ** 2

num = (x_train - x_mean).dot(y_train - y_mean)
d = (x_train - x_mean).dot(x_train - x_mean)

self.a_ = num/d
self.b_ = y_mean - self.a_ * x_mean

return self

def predict(self, x_predict):
"""给定待预测数据集x_predict, 返回表示x_predict的结果向量"""
assert x_predict.ndim == 1, \
"Simeple Linear Regressor can only solve single feature training data."
assert self.a_ is not None and self.b_ is not None, \
"must fit before predict!"

return np.array([self._predict(x) for x in x_predict])

def _predict(self, x_single):
"""给定单个待预测数据x_single, 返回x_single的预测结果值"""
return self.a_ * x_single + self.b_

def __repr__(self):
return "SimpleLinearRegression()"



R Square

R Square的输出分为以下几种情况：

• R^2 = 1，则模型不犯任何错误，完美
• R^2 = 0，模型为基准模型，相当于没训练过
• R^2 < 0，数据可能不存在任何线性关系

#### 2.多元线性回归

X0不是特征输入！

X展开是这个样子。每一行是一个样本点，每一列（除了第一列）是一种特征

Θ0就是简单线性回归中的b

"""
Created by 杨帮杰 on 10/1/18
Right to use this code in any way you want without
warranty, support or any guarantee of it working
E-mail: yangbangjie1998@qq.com
Association: SCAU 华南农业大学
"""

import numpy as np

class LinearRegression:

def __init__(self):
"""初始化Linear Regression模型"""
self.coef_ = None
self.interception_ = None
self._theta = None

def fit_normal(self, X_train, y_train):
"""根据训练数据集X_train, y_train训练Linear Regression模型"""
assert X_train.shape[0] == y_train.shape[0], \
"the size of X_train must be equal to the size of y_train"

X_b = np.hstack([np.ones((len(X_train), 1)), X_train])
self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)

self.interception_ = self._theta[0]
self.coef_ = self._theta[1:]

return self

def predict(self, X_predict):
"""给定待预测数据集X_predict, 返回表示X_predict的结果向量"""
assert self.interception_ is not None and self.coef_ is not None, \
"must fit before predict!"
assert X_predict.shape[1] == len(self.coef_), \
"the feature number of X_predict must be equal to X_train"

X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict])

return X_b.dot(self._theta)

def __repr__(self):
return "LinearRegression()"



sciki-learn中使用线性回归如下

"""
Created by 杨帮杰 on 10/1/18
Right to use this code in any way you want without
warranty, support or any guarantee of it working
E-mail: yangbangjie1998@qq.com
Association: SCAU 华南农业大学
"""

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# 加载波士顿房价的数据集

# 清除一些不合理的数据
X = boston.data
y = boston.target

X = X[y < 50.0]
y = y[y < 50.0]

# 分离出测试集并拟合
X_train, X_test, y_train, y_test = train_test_split(X, y)

lin_reg = LinearRegression()

lin_reg.fit(X_train, y_train)

# 打印结果
print(lin_reg.coef_)
print(lin_reg.intercept_)
print(lin_reg.score(X_test, y_test))



#### 3.总结

1. 思想简单，实现容易
2. 是许多非线性模型的基础
3. 具有很好的可解释性

1. 假设特征和标记之间有线性关系，现实中不一定
2. 训练的时间复杂度比较高

References:
Python3 入门机器学习 经典算法与应用 —— liuyubobobo

|
16小时前
|

**摘要：** 了解AI、ML和DL的旅程。AI是模拟人类智能的科学，ML是其分支，让机器从数据中学习。DL是ML的深化，利用多层神经网络处理复杂数据。AI应用广泛，包括医疗诊断、金融服务、自动驾驶等。ML助力个性化推荐和疾病预测。DL推动计算机视觉和自然语言处理的进步。从基础到实践，这些技术正改变我们的生活。想要深入学习，可参考《人工智能：一种现代的方法》和《深度学习》。一起探索智能的乐趣！
5 1
|
1天前
|

23 7
|
1天前
|

11 5
|
2天前
|

【机器学习】深入探索机器学习：线性回归算法的原理与应用
【机器学习】深入探索机器学习：线性回归算法的原理与应用
|
2天前
|

10 3
|
2天前
|

11 2
|
2天前
|

10 1
|
3天前
|

13 2
|
3天前
|

【KMeans】Python实现KMeans算法及其可视化
【KMeans】Python实现KMeans算法及其可视化
6 0
|
3天前
|

【6月更文挑战第13天】文本分类是机器学习在数字化时代的关键应用，涉及文本预处理、特征提取和模型训练等步骤。常见方法包括基于规则、关键词和机器学习，其中机器学习（如朴素贝叶斯、SVM、深度学习）是主流。在Python中，可使用scikit-learn进行文本分类，例如通过TF-IDF和朴素贝叶斯对新闻数据集进行处理和预测。随着技术发展，未来将深入探索深度学习和多模态数据在文本分类中的应用。
8 2