快速入门Python机器学习(37)

简介: 快速入门Python机器学习(37)

14.4管道模型


14.4.1管道模型基础

X,y = make_blobs(n_samples=200,centers=2,cluster_std=5)
        X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=38)
        scaler = StandardScaler().fit(X_train)
        X_train_scaled = scaler.transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        print("训练集形态:",X_train_scaled.shape)
        print("测试集形态:",X_test_scaled.shape)
        #原始的训练集
        plt.scatter(X_train[:,0],X_train[:,1])
        #经过预处理的训练集
        plt.scatter(X_train_scaled[:,0],X_train_scaled[:,1],marker='^',edgecolor='k')
        plt.title(u"训练集 VS 处理后的训练集")
        plt.rcParams['font.sans-serif']=['SimHei']
        plt.rcParams['axes.unicode_minus']=False
        plt.show()
params = {'hidden_layer_sizes':[(50,),(100,),(100,100)],"alpha":[0.0001,0.001,0.01]}
        grid = GridSearchCV(MLPClassifier(max_iter=1600,random_state=38),param_grid=params,cv=3)
        grid.fit(X_train_scaled,y_train)
        print("模型最高得分:\n{:.2%}".format(grid.best_score_))
        print("模型最高得分时的参数:\n{}".format(grid.best_params_))
       #打印模型在测试集上的得分
        print("测试集得分:\n{:.2%}".format(grid.score(X_test_scaled,y_test)))


输出

训练集形态: (150, 2)
测试集形态: (50, 2)
模型最高得分:
90.00%
模型最高得分时的参数:
'alpha': 0.0001, 'hidden_layer_sizes': (50,)}
测试集得分:
82.00%

image.png


14.4.2 pipeline

pipeline = Pipeline([('scaler',StandardScaler()),
                             ('mlp',MLPClassifier(max_iter=1600,random_state=38))])
        pipeline.fit(X_train,y_train)
        print("使用管道后的测试集得分:\n{:.2%}".format(pipeline.score(X_test,y_test)))
params = {'mlp__hidden_layer_sizes':[(50,),(100,),(100,100)],"mlp__alpha":[0.0001,0.001,0.01]}
        grid = GridSearchCV(pipeline,param_grid=params,cv=3)
        grid.fit(X_train,y_train)
        print("交叉验证最高得分:\n{:.2%}".format(grid.best_score_))
        print("模型最优参数:\n{}".format(grid.best_params_))
        print("测试集得分:\n{:.2%}".format(grid.score(X_test,y_test)))


输出

使用管道后的测试集得分:
86.00%
交叉验证最高得分:
90.00%
模型最优参数:
'mlp__alpha': 0.0001, 'mlp__hidden_layer_sizes': (50,)}
测试集得分:
82.00%


输出

随机差分交叉验证法后测试数据的得分:[0.96666667 1.  0.96666667 0.93333333 0.93333333 0.96666667  1.     0.96666667 1.   0.96666667]:
随机差分交叉验证法后测试数据的平均得分:97.00%:
随机差分预测的鸢尾花为:['setosa']:

image.png


14.4.3案例

#使用管道,Pipeline()方法与make_pipeline()等同
    pipeline = Pipeline([('scaler',StandardScaler()),
                          ('mlp',MLPRegressor(max_iter=1600,hidden_layer_sizes=[1,1],random_state=6))])
    pipe = make_pipeline(StandardScaler(),MLPRegressor(max_iter=1600,hidden_layer_sizes=[1,1],random_state=6))
    scores = cross_val_score(pipe,X,y,cv=20)
    print("pipe处理后模型平均分:{:.2%}".format(float(scores.mean())))

输出


pipe处理后模型平均分:-80419.24%


pipe = make_pipeline(StandardScaler(),SelectFromModel(RandomForestRegressor(random_state=6)),
MLPRegressor(max_iter=1600,hidden_layer_sizes=[1,1],random_state=6))    scores = cross_val_score(pipe,X,y,cv=20)    print(“经过pipe处理后,再经过SelectFromModel处理,模型平均分:{:.2%}".format(float(scores.mean())))

输出


经过pipe处理后,再经过SelectFromModel处理,模型平均分:-56190.48%

               

 params =[{'reg':[MLPRegressor(max_iter=1600,hidden_layer_sizes=[1,1],random_state=6)],
                  'scaler':[StandardScaler(),None]},
                 {'reg':[RandomForestRegressor(random_state=6)],
                  'scaler':[None]}]
       pipe = Pipeline([('scaler',StandardScaler()),('reg',MLPRegressor())])
       grid = GridSearchCV(pipe,params,cv=6)
       grid.fit(X,y)
       print("GridSearchCV处理后,最佳模型是:{}".format(grid.best_params_))
       print("GridSearchCV处理后,模型最佳得分:{:.2%}".format(grid.best_score_))
GridSearchCV处理后,最佳模型是:{'reg': RandomForestRegressor(random_state=6), 'scaler': None}
GridSearchCV处理后,模型最佳得分:-12.45%
params =[{'reg':[MLPRegressor(max_iter=1600,random_state=6)],
                  'scaler':[StandardScaler(),None],
                  'reg__hidden_layer_sizes':[(1),(50,),(100,),(1,1),(50,50),(100,100)]},
                 {'reg':[RandomForestRegressor(random_state=6)],
                  'scaler':[None],
                  'reg__n_estimators':[10,50,100]}]
       pipe = Pipeline([('scaler',StandardScaler()),('reg',MLPRegressor())])
       grid = GridSearchCV(pipe,params,cv=6)
       grid.fit(X,y)
       print("加入参数后,最佳模型是:{}".format(grid.best_params_))
       print("加入参数后,模型最佳得分:{:.2%}".format(grid.best_score_))


输出

加入参数后,最佳模型是:{'reg': RandomForestRegressor(random_state=6), 'reg__n_estimators': 100, 'scaler': None}
加入参数后,模型最佳得分:-12.45%


 
         


输出

加入参数后,最佳模型是:{'reg': RandomForestRegressor(random_state=6), 'reg__n_estimators': 100, 'scaler': None}
加入参数后,模型最佳得分:-12.45%


paradef get_better_score():
       warnings.filterwarnings("ignore")
       n_jobs = 2
       params=[{'reg':[LinearRegression()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_jobs":[n_jobs]},
                  {'reg':[LogisticRegression()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_jobs":[n_jobs]},
                  {'reg':[Ridge()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__alpha":[1,0.1,0.001,0.0001]},
{'reg':[Lasso()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__alpha":[1,0.1,0.001,0.0001]},{'reg':[ElasticNet()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__alpha":[0.1,0.5,1,5,10],"reg__l1_ratio":[0.1,0.5,0.9]},
{'reg':[RandomForestClassifier()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_estimators":[4,5,6,7],"reg__random_state":[2,3,4,5],"reg__n_jobs":[n_jobs],"reg__random_state":[range (0,200)]},
{'reg':[RandomForestRegressor()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_estimators":[4,5,6,7],"reg__random_state":[2,3,4,5],"reg__n_jobs":[n_jobs],"reg__random_state":[range (0,200)]},
{'reg':[DecisionTreeClassifier()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__max_depth":[1,3,5,7],"reg__random_state":[range (1,200)]},
{'reg':[DecisionTreeRegressor()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__max_depth":[1,3,5,7],"reg__random_state":[range (1,200)]},
{'reg':[KNeighborsClassifier()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_jobs":[n_jobs]},{'reg':[ ()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__n_jobs":[n_jobs]},
 {'reg':[BernoulliNB()],'scaler':[StandardScaler(),MinMaxScaler(),None]},
{'reg':[GaussianNB()],'scaler':[StandardScaler(),MinMaxScaler(),None]},
{'reg':[MultinomialNB()],'scaler':[MinMaxScaler()]},{'reg':[SVC(max_iter=10000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__kernel":["linear","rbf","sigmoid","poly"],"reg__gamma":[0.01,0.1,1,5,10],"reg__C":[1.0,3.0,5.0]},
{'reg':[SVR(max_iter=100000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__kernel":["linear","rbf","sigmoid","poly"],"reg__gamma":[0.01,0.1,1,5,10],"reg__C":[1.0,3.0,5.0]},
{'reg':[LinearSVC(max_iter=100000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__C":[1.0,3.0,5.0]},
{'reg':[LinearSVR(max_iter=100000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__C":[1.0,3.0,5.0]},
{'reg':[AdaBoostClassifier()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__random_state":[range (1,200)]},
{'reg':[AdaBoostRegressor()],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__random_state":[range (1,200)]},
{'reg':[VotingClassifier(estimators=[('log_clf', LogisticRegression()),('svm_clf', SVC(probability=True)),('dt_clf', DecisionTreeClassifier(random_state=666))])],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__voting":["hard","soft"],"reg__n_jobs":[n_jobs]},
{'reg':[LinearDiscriminantAnalysis(n_components=2)],'scaler':[StandardScaler(),MinMaxScaler(),None]},
{'reg':[MLPClassifier(max_iter=100000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__activation":["relu","tanh","identity","logistic"],"reg__alpha":[0.0001,0.001,0.01,1],"reg__hidden_layer_sizes":[(1),(50,),(100,),(1,1),(50,50),(100,100)]},
{'reg':[MLPRegressor(max_iter=100000)],'scaler':[StandardScaler(),MinMaxScaler(),None],"reg__activation":["relu","tanh","identity","logistic"],"reg__alpha":[0.0001,0.001,0.01,1],"reg__hidden_layer_sizes":[(1),(50,),(100,),(1,1),(50,50),(100,100)]}
]
       stock = pd.read_csv('stock1.csv',encoding='GBK')
       X = stock.loc[:,'价格':'流通市值']
       y = stock['涨跌幅']
       pipe = Pipeline([('scaler',StandardScaler()),('reg',MLPRegressor())])
       shuffle_split = ShuffleSplit(test_size=.2,train_size=.7,n_splits=10)
       grid = GridSearchCV(pipe,params,cv=shuffle_split)
       grid.fit(X,y)
       print("最佳模型是:{}".format(grid.best_params_))
       print("模型最佳训练得分:{:.2%}".format(grid.best_score_))
       print("模型最佳测试得分:{:.2%}".format(grid.score(X,y))))


输出

最佳模型是:{'reg': LinearRegression(n_jobs=2), 'reg__n_jobs': 2, 'scaler': StandardScaler()}
模型最佳训练得分:100.00%
模型最佳测试得分:100.00%


得到这个结果好让人意外,我们直接用StandardScaler()后用LinearRegression模型来拟合一下

d

ef best_stock():
       stock = pd.read_csv('stock1.csv',encoding='GBK')
       X = stock.loc[:,'价格':'流通市值']
       y = stock['涨跌幅']
       X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=62)
       clf = LinearRegression(n_jobs=2)
       scaler = StandardScaler()
       scaler.fit(X_train)
       X_train_scaled = scaler.transform(X_train)
       X_test_scaled = scaler.transform(X_test)
       clf.fit(X_train_scaled,y_train)
       print(“模型训练得分:{:.2%}".format(clf.score(X_train_scaled,y_train)))
       print(“模型测试得分:{:.2%}".format(clf.score(X_test_scaled,y_test)))


输出

模型训练得分:100.00%
模型测试得分:100.00%
目录
相关文章
|
4天前
|
机器学习/深度学习 人工智能 算法
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
手写数字识别系统,使用Python作为主要开发语言,基于深度学习TensorFlow框架,搭建卷积神经网络算法。并通过对数据集进行训练,最后得到一个识别精度较高的模型。并基于Flask框架,开发网页端操作平台,实现用户上传一张图片识别其名称。
18 0
【手写数字识别】Python+深度学习+机器学习+人工智能+TensorFlow+算法模型
|
6天前
|
机器学习/深度学习 数据采集 人工智能
探索机器学习:从理论到Python代码实践
【10月更文挑战第36天】本文将深入浅出地介绍机器学习的基本概念、主要算法及其在Python中的实现。我们将通过实际案例,展示如何使用scikit-learn库进行数据预处理、模型选择和参数调优。无论你是初学者还是有一定基础的开发者,都能从中获得启发和实践指导。
17 2
|
8天前
|
机器学习/深度学习 数据采集 搜索推荐
利用Python和机器学习构建电影推荐系统
利用Python和机器学习构建电影推荐系统
23 1
|
8天前
|
机器学习/深度学习 算法 PyTorch
用Python实现简单机器学习模型:以鸢尾花数据集为例
用Python实现简单机器学习模型:以鸢尾花数据集为例
25 1
|
14天前
|
机器学习/深度学习 数据采集 算法
Python机器学习:Scikit-learn库的高效使用技巧
【10月更文挑战第28天】Scikit-learn 是 Python 中最受欢迎的机器学习库之一,以其简洁的 API、丰富的算法和良好的文档支持而受到开发者喜爱。本文介绍了 Scikit-learn 的高效使用技巧,包括数据预处理(如使用 Pipeline 和 ColumnTransformer)、模型选择与评估(如交叉验证和 GridSearchCV)以及模型持久化(如使用 joblib)。通过这些技巧,你可以在机器学习项目中事半功倍。
21 3
|
6月前
|
机器学习/深度学习 存储 搜索推荐
利用机器学习算法改善电商推荐系统的效率
电商行业日益竞争激烈,提升用户体验成为关键。本文将探讨如何利用机器学习算法优化电商推荐系统,通过分析用户行为数据和商品信息,实现个性化推荐,从而提高推荐效率和准确性。
239 14
|
6月前
|
机器学习/深度学习 算法 数据可视化
实现机器学习算法时,特征选择是非常重要的一步,你有哪些推荐的方法?
实现机器学习算法时,特征选择是非常重要的一步,你有哪些推荐的方法?
114 1
|
6月前
|
机器学习/深度学习 算法 搜索推荐
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)
|
6月前
|
机器学习/深度学习 数据采集 算法
解码癌症预测的密码:可解释性机器学习算法SHAP揭示XGBoost模型的预测机制
解码癌症预测的密码:可解释性机器学习算法SHAP揭示XGBoost模型的预测机制
304 0
|
6月前
|
机器学习/深度学习 数据采集 监控
机器学习-特征选择:如何使用递归特征消除算法自动筛选出最优特征?
机器学习-特征选择:如何使用递归特征消除算法自动筛选出最优特征?
904 0

热门文章

最新文章