ML之SVM:利用SVM算法(超参数组合进行单线程网格搜索+3fCrVa)对20类新闻文本数据集进行分类预测、评估

简介: ML之SVM:利用SVM算法(超参数组合进行单线程网格搜索+3fCrVa)对20类新闻文本数据集进行分类预测、评估

输出结果

Fitting 3 folds for each of 12 candidates, totalling 36 fits

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   6.2s

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    6.2s remaining:    0.0s

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   7.1s

[CV] svc__C=0.1, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=0.01 -   7.0s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.9s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.8s

[CV] svc__C=0.1, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=0.1 -   6.3s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   6.3s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   7.0s

[CV] svc__C=0.1, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=0.1, svc__gamma=1.0 -   8.1s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -   8.8s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -  10.7s

[CV] svc__C=0.1, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=0.1, svc__gamma=10.0 -   9.4s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   8.4s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   6.7s

[CV] svc__C=1.0, svc__gamma=0.01 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=0.01 -   6.9s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.6s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.2s

[CV] svc__C=1.0, svc__gamma=0.1 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=0.1 -   6.8s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   7.6s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   7.7s

[CV] svc__C=1.0, svc__gamma=1.0 ......................................

[CV] ............................. svc__C=1.0, svc__gamma=1.0 -   8.2s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   6.7s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   8.4s

[CV] svc__C=1.0, svc__gamma=10.0 .....................................

[CV] ............................ svc__C=1.0, svc__gamma=10.0 -   9.5s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -  10.1s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -   9.9s

[CV] svc__C=10.0, svc__gamma=0.01 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=0.01 -   8.8s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   9.2s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   7.7s

[CV] svc__C=10.0, svc__gamma=0.1 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=0.1 -   6.9s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   8.0s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   9.5s

[CV] svc__C=10.0, svc__gamma=1.0 .....................................

[CV] ............................ svc__C=10.0, svc__gamma=1.0 -   9.0s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   8.6s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   8.1s

[CV] svc__C=10.0, svc__gamma=10.0 ....................................

[CV] ........................... svc__C=10.0, svc__gamma=10.0 -   9.0s

[Parallel(n_jobs=1)]: Done  36 out of  36 | elapsed:  4.8min finished

单线程:输出最佳模型在测试集上的准确性: 0.8226666666666667


设计思

image.png

 

核心代

class GridSearchCV(BaseSearchCV):

   """Exhaustive search over specified parameter values for an estimator.

 

   .. deprecated:: 0.18

   This module will be removed in 0.20.

   Use :class:`sklearn.model_selection.GridSearchCV` instead.

 

   Important members are fit, predict.

 

   GridSearchCV implements a "fit" and a "score" method.

   It also implements "predict", "predict_proba", "decision_function",

   "transform" and "inverse_transform" if they are implemented in the

   estimator used.

 

   The parameters of the estimator used to apply these methods are

    optimized

   by cross-validated grid-search over a parameter grid.

 

   Read more in the :ref:`User Guide <grid_search>`.

 

   Parameters

   ----------

   estimator : estimator object.

   A object of that type is instantiated for each grid point.

   This is assumed to implement the scikit-learn estimator interface.

   Either estimator needs to provide a ``score`` function,

   or ``scoring`` must be passed.

 

   param_grid : dict or list of dictionaries

   Dictionary with parameters names (string) as keys and lists of

   parameter settings to try as values, or a list of such

   dictionaries, in which case the grids spanned by each dictionary

   in the list are explored. This enables searching over any sequence

   of parameter settings.

 

   scoring : string, callable or None, default=None

   A string (see model evaluation documentation) or

   a scorer callable object / function with signature

   ``scorer(estimator, X, y)``.

   If ``None``, the ``score`` method of the estimator is used.

 

   fit_params : dict, optional

   Parameters to pass to the fit method.

 

   n_jobs: int, default: 1 :

   The maximum number of estimators fit in parallel.

 

   - If -1 all CPUs are used.

 

   - If 1 is given, no parallel computing code is used at all,

   which is useful for debugging.

 

   - For ``n_jobs`` below -1, ``(n_cpus + n_jobs + 1)`` are used.

   For example, with ``n_jobs = -2`` all CPUs but one are used.

 

   .. versionchanged:: 0.17

   Upgraded to joblib 0.9.3.

 

   pre_dispatch : int, or string, optional

   Controls the number of jobs that get dispatched during parallel

   execution. Reducing this number can be useful to avoid an

   explosion of memory consumption when more jobs get dispatched

   than CPUs can process. This parameter can be:

 

   - None, in which case all the jobs are immediately

   created and spawned. Use this for lightweight and

   fast-running jobs, to avoid delays due to on-demand

   spawning of the jobs

 

   - An int, giving the exact number of total jobs that are

   spawned

 

   - A string, giving an expression as a function of n_jobs,

   as in '2*n_jobs'

 

   iid : boolean, default=True

   If True, the data is assumed to be identically distributed across

   the folds, and the loss minimized is the total loss per sample,

   and not the mean loss across the folds.

 

   cv : int, cross-validation generator or an iterable, optional

   Determines the cross-validation splitting strategy.

   Possible inputs for cv are:

 

   - None, to use the default 3-fold cross-validation,

   - integer, to specify the number of folds.

   - An object to be used as a cross-validation generator.

   - An iterable yielding train/test splits.

 

   For integer/None inputs, if the estimator is a classifier and ``y`` is

   either binary or multiclass,

   :class:`sklearn.model_selection.StratifiedKFold` is used. In all

   other cases, :class:`sklearn.model_selection.KFold` is used.

 

   Refer :ref:`User Guide <cross_validation>` for the various

   cross-validation strategies that can be used here.

 

   refit : boolean, default=True

   Refit the best estimator with the entire dataset.

   If "False", it is impossible to make predictions using

   this GridSearchCV instance after fitting.

 

   verbose : integer

   Controls the verbosity: the higher, the more messages.

 

   error_score : 'raise' (default) or numeric

   Value to assign to the score if an error occurs in estimator fitting.

   If set to 'raise', the error is raised. If a numeric value is given,

   FitFailedWarning is raised. This parameter does not affect the refit

   step, which will always raise the error.

 

 

   Examples

   --------

   >>> from sklearn import svm, grid_search, datasets

   >>> iris = datasets.load_iris()

   >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}

   >>> svr = svm.SVC()

   >>> clf = grid_search.GridSearchCV(svr, parameters)

   >>> clf.fit(iris.data, iris.target)

   ...                             # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS

   GridSearchCV(cv=None, error_score=...,

   estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,

   decision_function_shape='ovr', degree=..., gamma=...,

   kernel='rbf', max_iter=-1, probability=False,

   random_state=None, shrinking=True, tol=...,

   verbose=False),

   fit_params={}, iid=..., n_jobs=1,

   param_grid=..., pre_dispatch=..., refit=...,

   scoring=..., verbose=...)

 

 

   Attributes

   ----------

   grid_scores_ : list of named tuples

   Contains scores for all parameter combinations in param_grid.

   Each entry corresponds to one parameter setting.

   Each named tuple has the attributes:

 

   * ``parameters``, a dict of parameter settings

   * ``mean_validation_score``, the mean score over the

   cross-validation folds

   * ``cv_validation_scores``, the list of scores for each fold

 

   best_estimator_ : estimator

   Estimator that was chosen by the search, i.e. estimator

   which gave highest score (or smallest loss if specified)

   on the left out data. Not available if refit=False.

 

   best_score_ : float

   Score of best_estimator on the left out data.

 

   best_params_ : dict

   Parameter setting that gave the best results on the hold out data.

 

   scorer_ : function

   Scorer function used on the held out data to choose the best

   parameters for the model.

 

   Notes

   ------

   The parameters selected are those that maximize the score of the left

    out

   data, unless an explicit score is passed in which case it is used instead.

 

   If `n_jobs` was set to a value higher than one, the data is copied for

    each

   point in the grid (and not `n_jobs` times). This is done for efficiency

   reasons if individual jobs take very little time, but may raise errors if

   the dataset is large and not enough memory is available.  A

    workaround in

   this case is to set `pre_dispatch`. Then, the memory is copied only

   `pre_dispatch` many times. A reasonable value for `pre_dispatch` is `2 *

   n_jobs`.

 

   See Also

   ---------

   :class:`ParameterGrid`:

   generates all the combinations of a hyperparameter grid.

 

   :func:`sklearn.cross_validation.train_test_split`:

   utility function to split the data into a development set usable

   for fitting a GridSearchCV instance and an evaluation set for

   its final evaluation.

 

   :func:`sklearn.metrics.make_scorer`:

   Make a scorer from a performance metric or loss function.

 

   """

   def __init__(self, estimator, param_grid, scoring=None,

    fit_params=None,

       n_jobs=1, iid=True, refit=True, cv=None, verbose=0,

       pre_dispatch='2*n_jobs', error_score='raise'):

       super(GridSearchCV, self).__init__(estimator, scoring, fit_params,

        n_jobs, iid, refit, cv, verbose, pre_dispatch, error_score)

       self.param_grid = param_grid

       _check_param_grid(param_grid)

 

   def fit(self, X, y=None):

       """Run fit with all sets of parameters.

       Parameters

       ----------

       X : array-like, shape = [n_samples, n_features]

           Training vector, where n_samples is the number of samples and

           n_features is the number of features.

       y : array-like, shape = [n_samples] or [n_samples, n_output],

        optional

           Target relative to X for classification or regression;

           None for unsupervised learning.

       """

       return self._fit(X, y, ParameterGrid(self.param_grid))


相关文章
|
3月前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于MSER和HOG特征提取的SVM交通标志检测和识别算法matlab仿真
### 算法简介 1. **算法运行效果图预览**:展示算法效果,完整程序运行后无水印。 2. **算法运行软件版本**:Matlab 2017b。 3. **部分核心程序**:完整版代码包含中文注释及操作步骤视频。 4. **算法理论概述**: - **MSER**:用于检测显著区域,提取图像中稳定区域,适用于光照变化下的交通标志检测。 - **HOG特征提取**:通过计算图像小区域的梯度直方图捕捉局部纹理信息,用于物体检测。 - **SVM**:寻找最大化间隔的超平面以分类样本。 整个算法流程图见下图。
|
2月前
|
机器学习/深度学习 算法 Serverless
基于WOA-SVM的乳腺癌数据分类识别算法matlab仿真,对比BP神经网络和SVM
本项目利用鲸鱼优化算法(WOA)优化支持向量机(SVM)参数,针对乳腺癌早期诊断问题,通过MATLAB 2022a实现。核心代码包括参数初始化、目标函数计算、位置更新等步骤,并附有详细中文注释及操作视频。实验结果显示,WOA-SVM在提高分类精度和泛化能力方面表现出色,为乳腺癌的早期诊断提供了有效的技术支持。
|
2月前
|
机器学习/深度学习 算法 关系型数据库
基于PSO-SVM的乳腺癌数据分类识别算法matlab仿真,对比BP神经网络和SVM
本项目展示了利用粒子群优化(PSO)算法优化支持向量机(SVM)参数的过程,提高了分类准确性和泛化能力。包括无水印的算法运行效果预览、Matlab2022a环境下的实现、核心代码及详细注释、操作视频,以及对PSO和SVM理论的概述。PSO-SVM结合了PSO的全局搜索能力和SVM的分类优势,特别适用于复杂数据集的分类任务,如乳腺癌诊断等。
|
2月前
|
算法 搜索推荐 数据库
二分搜索:高效的查找算法
【10月更文挑战第29天】通过对二分搜索的深入研究和应用,我们可以不断挖掘其潜力,为各种复杂问题提供高效的解决方案。相信在未来的科技发展中,二分搜索将继续发挥着重要的作用,为我们的生活和工作带来更多的便利和创新。
61 1
|
3月前
|
算法 决策智能
基于禁忌搜索算法的VRP问题求解matlab仿真,带GUI界面,可设置参数
该程序基于禁忌搜索算法求解车辆路径问题(VRP),使用MATLAB2022a版本实现,并带有GUI界面。用户可通过界面设置参数并查看结果。禁忌搜索算法通过迭代改进当前解,并利用记忆机制避免陷入局部最优。程序包含初始化、定义邻域结构、设置禁忌列表等步骤,最终输出最优路径和相关数据图表。
|
3月前
|
存储 算法 C++
【搜索算法】 跳马问题(C/C++)
【搜索算法】 跳马问题(C/C++)
|
3月前
|
人工智能 算法 Java
【搜索算法】数字游戏(C/C++)
【搜索算法】数字游戏(C/C++)
|
2天前
|
算法 数据安全/隐私保护
室内障碍物射线追踪算法matlab模拟仿真
### 简介 本项目展示了室内障碍物射线追踪算法在无线通信中的应用。通过Matlab 2022a实现,包含完整程序运行效果(无水印),支持增加发射点和室内墙壁设置。核心代码配有详细中文注释及操作视频。该算法基于几何光学原理,模拟信号在复杂室内环境中的传播路径与强度,涵盖场景建模、射线发射、传播及接收点场强计算等步骤,为无线网络规划提供重要依据。
|
15天前
|
机器学习/深度学习 算法
基于改进遗传优化的BP神经网络金融序列预测算法matlab仿真
本项目基于改进遗传优化的BP神经网络进行金融序列预测,使用MATLAB2022A实现。通过对比BP神经网络、遗传优化BP神经网络及改进遗传优化BP神经网络,展示了三者的误差和预测曲线差异。核心程序结合遗传算法(GA)与BP神经网络,利用GA优化BP网络的初始权重和阈值,提高预测精度。GA通过选择、交叉、变异操作迭代优化,防止局部收敛,增强模型对金融市场复杂性和不确定性的适应能力。
150 80
|
3天前
|
机器学习/深度学习 数据采集 算法
基于GA遗传优化的CNN-GRU-SAM网络时间序列回归预测算法matlab仿真
本项目基于MATLAB2022a实现时间序列预测,采用CNN-GRU-SAM网络结构。卷积层提取局部特征,GRU层处理长期依赖,自注意力机制捕捉全局特征。完整代码含中文注释和操作视频,运行效果无水印展示。算法通过数据归一化、种群初始化、适应度计算、个体更新等步骤优化网络参数,最终输出预测结果。适用于金融市场、气象预报等领域。
基于GA遗传优化的CNN-GRU-SAM网络时间序列回归预测算法matlab仿真
下一篇
开通oss服务