快速入门Python机器学习（21）-阿里云开发者社区

快速入门Python机器学习（21）

2023-02-15 108 发布于内蒙古

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 快速入门Python机器学习（21）

10.1.3 随机森林回归法

类参数、属性和方法

类

class sklearn.ensemble.RandomForestRegressor(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, ccp_alpha=0.0, max_samples=None)

参数

参数	类型	解释
n_estimators	int, default=100	森林中树木的数量。
random_state	RandomState instance or None, default=None	控制生成树时使用的样本引导的随机性（如果bootstrap=True）和在每个节点上查找最佳分割时要考虑的特征的采样（如果max_features < n_features）。

属性

属性	解释
base_estimator_	DecisionTreeClassifier用于创建拟合子估计器集合的子估计器模板。
estimators_	list of DecisionTreeClassifier拟合子估计量的集合。
n_features_	int执行拟合时的特征数。
n_outputs_	int执行拟合时的输出数。
feature_importances_	ndarray of shape (n_features,)基于杂质的特征非常重要。
oob_score_	float使用现成的估计值获得的训练数据集的得分。只有当oob_score为True时，此属性才存在。
oob_decision_function	ndarray of shape (n_samples, n_classes)利用训练集上的包外估计计算决策函数。如果nèu估计量很小，则可能在引导过程中从未遗漏数据点。在这种情况下，oob_decision_function_可能包含NaN。只有当oob_score为True时，此属性才存在。

方法

apply(X)	将森林中的树应用到X，返回叶指数。
decision_path(X)	返回林中的决策路径。
fit(X, y[, sample_weight])	从训练集（X，y）建立一个树的森林。
get_params([deep])	获取此估计器的参数。
predict(X)	预测X的回归目标。
score(X, y[, sample_weight])	返回预测的确定系数R2。
set_params(**params)	设置此估计器的参数。

随机森林回归分析make_regression数据

#加入噪音
def DecisionTreeRegressor_for_make_regression_add_noise():
       myutil = util()
       X,y = make_regression(n_samples=100,n_features=1,n_informative=2,noise=50,random_state=8)
       X_train,X_test,y_train,y_test = train_test_split(X, y, random_state=8,test_size=0.3)
       clf = DecisionTreeRegressor().fit(X,y)
       title = "make_regression DecisionTreeRegressor()回归线（有噪音）"
       myutil.print_scores(clf,X_train,y_train,X_test,y_test,title)
       myutil.draw_line(X[:,0],y,clf,title)
       myutil.plot_learning_curve(DecisionTreeRegressor(),X,y,title)
       myutil.show_pic(title)

输出

make_regression DecisionTreeRegressor()回归线（有噪音）:
100.00%
make_regression DecisionTreeRegressor()回归线（有噪音）:
100.00%

随机森林回归分析波士顿房价数据

#分析波士顿房价数据
def DecisionTreeRegressor_for_boston():
       myutil = util()
       boston = datasets.load_boston()
       X,y = boston.data,boston.target
       X_train,X_test,y_train,y_test = train_test_split(X, y, random_state =8)
       for max_depth in [1,3,5,7]:
              clf = DecisionTreeRegressor(max_depth=max_depth)
              clf.fit(X_train,y_train)
              title=u"波士顿据测试集(max_depth="+str(max_depth)+")"
              myutil.print_scores(clf,X_train,y_train,X_test,y_test,title)
              myutil.plot_learning_curve(DecisionTreeRegressor(max_depth=max_depth),X,y,title)
              myutil.show_pic(title)

输出

波士顿据训练集(max_depth=1):
45.95%
波士顿据测试集(max_depth=1):
35.44%
波士顿据训练集(max_depth=3):
83.84%
波士顿据测试集(max_depth=3):
62.87%
波士顿据训练集(max_depth=5):
93.82%
波士顿据测试集(max_depth=5):
70.37%
波士顿据训练集(max_depth=7):
97.31%
波士顿据测试集(max_depth=7):
77.55%

仍旧存在过拟合现象

随机森林回归分析糖尿病数据

#分析糖尿病数据
def DecisionTreeRegressor_for_diabetes():
       myutil = util()
       diabetes = datasets.load_diabetes()
       X,y = diabetes.data,diabetes.target
       X_train,X_test,y_train,y_test = train_test_split(X, y, random_state =8)
       for max_depth in [1,3,5,7]:
              clf = DecisionTreeRegressor(max_depth=max_depth)
              clf.fit(X_train,y_train)
              title=u"糖尿病据测试集(max_depth="+str(max_depth)+")"
              myutil.print_scores(clf,X_train,y_train,X_test,y_test,title)
              myutil.plot_learning_curve(DecisionTreeRegressor(max_depth=max_depth),X,y,title)
              myutil.show_pic(title)

输出

糖尿病据训练集(max_depth=1):
30.44%
糖尿病据测试集(max_depth=1):
15.21%
糖尿病据训练集(max_depth=3):
55.64%
糖尿病据测试集(max_depth=3):
28.37%
糖尿病据训练集(max_depth=5):
71.81%
糖尿病据测试集(max_depth=5):
19.97%
糖尿病据训练集(max_depth=7):
84.30%
糖尿病据测试集(max_depth=7):
-1.30%

仍旧存在严重过拟合现象

快速入门Python机器学习（21）

10.1.3 随机森林回归法

类参数、属性和方法

随机森林回归分析make_regression数据

随机森林回归分析波士顿房价数据

随机森林回归分析糖尿病数据

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

快速入门Python机器学习（21）

10.1.3 随机森林回归法

类参数、属性和方法

随机森林回归分析make_regression数据

随机森林回归分析波士顿房价数据

随机森林回归分析糖尿病数据

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像