ML之SVM:利用SVM算法(超参数组合进行单线程网格搜索+3fCrVa)对20类新闻文本数据集进行分类预测、评估

简介: ML之SVM:利用SVM算法(超参数组合进行单线程网格搜索+3fCrVa)对20类新闻文本数据集进行分类预测、评估

输出结果

Fitting 3 folds for each of 12 candidates, totalling 36 fits

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   6.2s

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    6.2s remaining:    0.0s

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   7.1s

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   7.0s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.9s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.8s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.3s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   6.3s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   7.0s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   8.1s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -   8.8s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -  10.7s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -   9.4s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   8.4s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   6.7s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   6.9s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.6s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.2s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.8s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   7.6s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   7.7s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   8.2s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   6.7s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   8.4s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   9.5s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -  10.1s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -   9.9s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -   8.8s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   9.2s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   7.7s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   6.9s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   8.0s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   9.5s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   9.0s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   8.6s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   8.1s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   9.0s

[Parallel(n_jobs=1)]: Done  36 out of  36 | elapsed:  4.8min finished

单线程:输出最佳模型在测试集上的准确性: 0.8226666666666667


设计思

image.png

 

核心代

class GridSearchCV(BaseSearchCV):

   """Exhaustive search over specified parameter values for an estimator.

 

   .. deprecated:: 0.18

   This module will be removed in 0.20.

   Use :class:`sklearn.model_selection.GridSearchCV` instead.

 

   Important members are fit, predict.

 

   GridSearchCV implements a "fit" and a "score" method.

   It also implements "predict", "predict_proba", "decision_function",

   "transform" and "inverse_transform" if they are implemented in the

   estimator used.

 

   The parameters of the estimator used to apply these methods are

    optimized

   by cross-validated grid-search over a parameter grid.

 

   Read more in the :ref:`User Guide <grid_search>`.

 

   Parameters

   ----------

   estimator : estimator object.

   A object of that type is instantiated for each grid point.

   This is assumed to implement the scikit-learn estimator interface.

   Either estimator needs to provide a ``score`` function,

   or ``scoring`` must be passed.

 

   param_grid : dict or list of dictionaries

   Dictionary with parameters names (string) as keys and lists of

   parameter settings to try as values, or a list of such

   dictionaries, in which case the grids spanned by each dictionary

   in the list are explored. This enables searching over any sequence

   of parameter settings.

 

   scoring : string, callable or None, default=None

   A string (see model evaluation documentation) or

   a scorer callable object / function with signature

   ``scorer(estimator, X, y)``.

   If ``None``, the ``score`` method of the estimator is used.

 

   fit_params : dict, optional

   Parameters to pass to the fit method.

 

   n_jobs: int, default: 1 :

   The maximum number of estimators fit in parallel.

 

   - If -1 all CPUs are used.

 

   - If 1 is given, no parallel computing code is used at all,

   which is useful for debugging.

 

   - For ``n_jobs`` below -1, ``(n_cpus + n_jobs + 1)`` are used.

   For example, with ``n_jobs = -2`` all CPUs but one are used.

 

   .. versionchanged:: 0.17

   Upgraded to joblib 0.9.3.

 

   pre_dispatch : int, or string, optional

   Controls the number of jobs that get dispatched during parallel

   execution. Reducing this number can be useful to avoid an

   explosion of memory consumption when more jobs get dispatched

   than CPUs can process. This parameter can be:

 

   - None, in which case all the jobs are immediately

   created and spawned. Use this for lightweight and

   fast-running jobs, to avoid delays due to on-demand

   spawning of the jobs

 

   - An int, giving the exact number of total jobs that are

   spawned

 

   - A string, giving an expression as a function of n_jobs,

   as in '2*n_jobs'

 

   iid : boolean, default=True

   If True, the data is assumed to be identically distributed across

   the folds, and the loss minimized is the total loss per sample,

   and not the mean loss across the folds.

 

   cv : int, cross-validation generator or an iterable, optional

   Determines the cross-validation splitting strategy.

   Possible inputs for cv are:

 

   - None, to use the default 3-fold cross-validation,

   - integer, to specify the number of folds.

   - An object to be used as a cross-validation generator.

   - An iterable yielding train/test splits.

 

   For integer/None inputs, if the estimator is a classifier and ``y`` is

   either binary or multiclass,

   :class:`sklearn.model_selection.StratifiedKFold` is used. In all

   other cases, :class:`sklearn.model_selection.KFold` is used.

 

   Refer :ref:`User Guide <cross_validation>` for the various

   cross-validation strategies that can be used here.

 

   refit : boolean, default=True

   Refit the best estimator with the entire dataset.

   If "False", it is impossible to make predictions using

   this GridSearchCV instance after fitting.

 

   verbose : integer

   Controls the verbosity: the higher, the more messages.

 

   error_score : 'raise' (default) or numeric

   Value to assign to the score if an error occurs in estimator fitting.

   If set to 'raise', the error is raised. If a numeric value is given,

   FitFailedWarning is raised. This parameter does not affect the refit

   step, which will always raise the error.

 

 

   Examples

   --------

   >>> from sklearn import svm, grid_search, datasets

   >>> iris = datasets.load_iris()

   >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}

   >>> svr = svm.SVC()

   >>> clf = grid_search.GridSearchCV(svr, parameters)

   >>> clf.fit(iris.data, iris.target)

   ...                             # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

   GridSearchCV(cv=None, error_score=...,

   estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,

   decision_function_shape='ovr', degree=..., gamma=...,

   kernel='rbf', max_iter=-1, probability=False,

   random_state=None, shrinking=True, tol=...,

   verbose=False),

   fit_params={}, iid=..., n_jobs=1,

   param_grid=..., pre_dispatch=..., refit=...,

   scoring=..., verbose=...)

 

 

   Attributes

   ----------

   grid_scores_ : list of named tuples

   Contains scores for all parameter combinations in param_grid.

   Each entry corresponds to one parameter setting.

   Each named tuple has the attributes:

 

   * ``parameters``, a dict of parameter settings

   * ``mean_validation_score``, the mean score over the

   cross-validation folds

   * ``cv_validation_scores``, the list of scores for each fold

 

   best_estimator_ : estimator

   Estimator that was chosen by the search, i.e. estimator

   which gave highest score (or smallest loss if specified)

   on the left out data. Not available if refit=False.

 

   best_score_ : float

   Score of best_estimator on the left out data.

 

   best_params_ : dict

   Parameter setting that gave the best results on the hold out data.

 

   scorer_ : function

   Scorer function used on the held out data to choose the best

   parameters for the model.

 

   Notes

   ------

   The parameters selected are those that maximize the score of the left

    out

   data, unless an explicit score is passed in which case it is used instead.

 

   If `n_jobs` was set to a value higher than one, the data is copied for

    each

   point in the grid (and not `n_jobs` times). This is done for efficiency

   reasons if individual jobs take very little time, but may raise errors if

   the dataset is large and not enough memory is available.  A

    workaround in

   this case is to set `pre_dispatch`. Then, the memory is copied only

   `pre_dispatch` many times. A reasonable value for `pre_dispatch` is `2 *

   n_jobs`.

 

   See Also

   ---------

   :class:`ParameterGrid`:

   generates all the combinations of a hyperparameter grid.

 

   :func:`sklearn.cross_validation.train_test_split`:

   utility function to split the data into a development set usable

   for fitting a GridSearchCV instance and an evaluation set for

   its final evaluation.

 

   :func:`sklearn.metrics.make_scorer`:

   Make a scorer from a performance metric or loss function.

 

   """

   def __init__(self, estimator, param_grid, scoring=None,

    fit_params=None,

       n_jobs=1, iid=True, refit=True, cv=None, verbose=0,

       pre_dispatch='2*n_jobs', error_score='raise'):

       super(GridSearchCV, self).__init__(estimator, scoring, fit_params,

        n_jobs, iid, refit, cv, verbose, pre_dispatch, error_score)

       self.param_grid = param_grid

       _check_param_grid(param_grid)

 

   def fit(self, X, y=None):

       """Run fit with all sets of parameters.

       Parameters

       ----------

       X : array-like, shape = [n_samples, n_features]

           Training vector, where n_samples is the number of samples and

           n_features is the number of features.

       y : array-like, shape = [n_samples] or [n_samples, n_output],

        optional

           Target relative to X for classification or regression;

           None for unsupervised learning.

       """

       return self._fit(X, y, ParameterGrid(self.param_grid))


目录
打赏
0
0
0
0
1043
分享
相关文章
算法系列之搜索算法-深度优先搜索DFS
深度优先搜索和广度优先搜索一样,都是对图进行搜索的算法,目的也都是从起点开始搜索,直到到达顶点。深度优先搜索会沿着一条路径不断的往下搜索,直到不能够在继续为止,然后在折返,开始搜索下一条候补路径。
108 62
算法系列之搜索算法-深度优先搜索DFS
利用SVM(支持向量机)分类算法对鸢尾花数据集进行分类
本文介绍了如何使用支持向量机(SVM)算法对鸢尾花数据集进行分类。作者通过Python的sklearn库加载数据,并利用pandas、matplotlib等工具进行数据分析和可视化。
229 70
基于入侵野草算法的KNN分类优化matlab仿真
本程序基于入侵野草算法(IWO)优化KNN分类器,通过模拟自然界中野草的扩散与竞争过程,寻找最优特征组合和超参数。核心步骤包括初始化、繁殖、变异和选择,以提升KNN分类效果。程序在MATLAB2022A上运行,展示了优化后的分类性能。该方法适用于高维数据和复杂分类任务,显著提高了分类准确性。
|
2月前
|
算法系列之搜索算法-广度优先搜索BFS
广度优先搜索(BFS)是一种非常强大的算法,特别适用于解决最短路径、层次遍历和连通性问题。在面试中,掌握BFS的基本实现和应用场景,能够帮助你高效解决许多与图或树相关的问题。
65 1
算法系列之搜索算法-广度优先搜索BFS
基于SOA海鸥优化算法的三维曲面最高点搜索matlab仿真
本程序基于海鸥优化算法(SOA)进行三维曲面最高点搜索的MATLAB仿真,输出收敛曲线和搜索结果。使用MATLAB2022A版本运行,核心代码实现种群初始化、适应度计算、交叉变异等操作。SOA模拟海鸥觅食行为,通过搜索飞行、跟随飞行和掠食飞行三种策略高效探索解空间,找到全局最优解。
基于免疫算法的最优物流仓储点选址方案MATLAB仿真
本程序基于免疫算法实现物流仓储点选址优化,并通过MATLAB 2022A仿真展示结果。核心代码包括收敛曲线绘制、最优派送路线规划及可视化。算法模拟生物免疫系统,通过多样性生成、亲和力评价、选择、克隆、变异和抑制机制,高效搜索最优解。解决了物流仓储点选址这一复杂多目标优化问题,显著提升物流效率与服务质量。附完整无水印运行结果图示。
基于免疫算法的最优物流仓储点选址方案MATLAB仿真
基于GA遗传优化TCN时间卷积神经网络时间序列预测算法matlab仿真
本内容介绍了一种基于遗传算法优化的时间卷积神经网络(TCN)用于时间序列预测的方法。算法运行于 Matlab2022a,完整程序无水印,附带核心代码、中文注释及操作视频。TCN通过因果卷积层与残差连接学习时间序列复杂特征,但其性能依赖超参数设置。遗传算法通过对种群迭代优化,确定最佳超参数组合,提升预测精度。此方法适用于金融、气象等领域,实现更准确可靠的未来趋势预测。
基于生物地理算法的MLP多层感知机优化matlab仿真
本程序基于生物地理算法(BBO)优化MLP多层感知机,通过MATLAB2022A实现随机数据点的趋势预测,并输出优化收敛曲线。BBO模拟物种在地理空间上的迁移、竞争与适应过程,以优化MLP的权重和偏置参数,提升预测性能。完整程序无水印,适用于机器学习和数据预测任务。
111 31
基于LSB最低有效位的音频水印嵌入提取算法FPGA实现,包含testbench和MATLAB对比
本项目展示了一种基于FPGA的音频水印算法,采用LSB(最低有效位)技术实现版权保护与数据追踪功能。使用Vivado2019.2和Matlab2022a开发,完整代码含中文注释及操作视频。算法通过修改音频采样点的最低有效位嵌入水印,人耳难以察觉变化。然而,面对滤波或压缩等攻击时,水印提取可能受影响。该项目运行效果无水印干扰,适合实时应用场景,核心逻辑简单高效,时间复杂度低。
基于GA遗传算法的拱桥静载试验车辆最优布载matlab仿真
本程序基于遗传算法(GA)实现拱桥静载试验车辆最优布载的MATLAB仿真,旨在自动化确定车辆位置以满足加载效率要求(0.95≤ηq≤1.05),目标是使ηq尽量接近1,同时减少车辆数量和布载耗时。程序在MATLAB 2022A版本下运行,展示了工况1至工况3的测试结果。通过优化模型,综合考虑车辆重量、位置、类型及车道占用等因素,确保桥梁关键部位承受最大荷载,从而有效评估桥梁性能。核心代码实现了迭代优化过程,并输出最优布载方案及相关参数。
AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等