ML之LoR&Bagging&RF:依次利用LoR、Bagging、RF算法对泰坦尼克号数据集 (Kaggle经典案例)获救人员进行二分类预测(最全)(二)

简介: ML之LoR&Bagging&RF:依次利用LoR、Bagging、RF算法对泰坦尼克号数据集 (Kaggle经典案例)获救人员进行二分类预测(最全)


核心代码

clf_LoR = linear_model.LogisticRegression(C=1.0, penalty='l1', tol=1e-6)

clf_LoR.fit(X, y)

#LoR算法

class LogisticRegression Found at: sklearn.linear_model.logistic

class LogisticRegression(BaseEstimator, LinearClassifierMixin,

   SparseCoefMixin):

   """Logistic Regression (aka logit, MaxEnt) classifier.

 

   In the multiclass case, the training algorithm uses the one-vs-rest (OvR)

   scheme if the 'multi_class' option is set to 'ovr', and uses the cross-

   entropy loss if the 'multi_class' option is set to 'multinomial'.

   (Currently the 'multinomial' option is supported only by the 'lbfgs',

   'sag' and 'newton-cg' solvers.)

 

   This class implements regularized logistic regression using the

   'liblinear' library, 'newton-cg', 'sag' and 'lbfgs' solvers. It can handle

   both dense and sparse input. Use C-ordered arrays or CSR matrices

   containing 64-bit floats for optimal performance; any other input

    format

   will be converted (and copied).

 

   The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2

    regularization

   with primal formulation. The 'liblinear' solver supports both L1 and L2

   regularization, with a dual formulation only for the L2 penalty.

 

   Read more in the :ref:`User Guide <logistic_regression>`.

 

   Parameters

   ----------

   penalty : str, 'l1' or 'l2', default: 'l2'

   Used to specify the norm used in the penalization. The 'newton-cg',

   'sag' and 'lbfgs' solvers support only l2 penalties.

 

   .. versionadded:: 0.19

   l1 penalty with SAGA solver (allowing 'multinomial' + L1)

 

   dual : bool, default: False

   Dual or primal formulation. Dual formulation is only implemented for

   l2 penalty with liblinear solver. Prefer dual=False when

   n_samples > n_features.

 

   tol : float, default: 1e-4

   Tolerance for stopping criteria.

 

   C : float, default: 1.0

   Inverse of regularization strength; must be a positive float.

   Like in support vector machines, smaller values specify stronger

   regularization.

 

   fit_intercept : bool, default: True

   Specifies if a constant (a.k.a. bias or intercept) should be

   added to the decision function.

 

   intercept_scaling : float, default 1.

   Useful only when the solver 'liblinear' is used

   and self.fit_intercept is set to True. In this case, x becomes

   [x, self.intercept_scaling],

   i.e. a "synthetic" feature with constant value equal to

   intercept_scaling is appended to the instance vector.

   The intercept becomes ``intercept_scaling * synthetic_feature_weight``.

 

   Note! the synthetic feature weight is subject to l1/l2 regularization

   as all other features.

   To lessen the effect of regularization on synthetic feature weight

   (and therefore on the intercept) intercept_scaling has to be increased.

 

   class_weight : dict or 'balanced', default: None

   Weights associated with classes in the form ``{class_label: weight}``.

   If not given, all classes are supposed to have weight one.

 

   The "balanced" mode uses the values of y to automatically adjust

   weights inversely proportional to class frequencies in the input data

   as ``n_samples / (n_classes * np.bincount(y))``.

 

   Note that these weights will be multiplied with sample_weight (passed

   through the fit method) if sample_weight is specified.

 

   .. versionadded:: 0.17

   *class_weight='balanced'*

 

   random_state : int, RandomState instance or None, optional, default:

    None

   The seed of the pseudo random number generator to use when

    shuffling

   the data.  If int, random_state is the seed used by the random number

   generator; If RandomState instance, random_state is the random

    number

   generator; If None, the random number generator is the RandomState

   instance used by `np.random`. Used when ``solver`` == 'sag' or

   'liblinear'.

 

   solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'},

   default: 'liblinear'

   Algorithm to use in the optimization problem.

 

   - For small datasets, 'liblinear' is a good choice, whereas 'sag' and

   'saga' are faster for large ones.

   - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'

   handle multinomial loss; 'liblinear' is limited to one-versus-rest

   schemes.

   - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas

   'liblinear' and 'saga' handle L1 penalty.

 

   Note that 'sag' and 'saga' fast convergence is only guaranteed on

   features with approximately the same scale. You can

   preprocess the data with a scaler from sklearn.preprocessing.

 

   .. versionadded:: 0.17

   Stochastic Average Gradient descent solver.

   .. versionadded:: 0.19

   SAGA solver.

 

   max_iter : int, default: 100

   Useful only for the newton-cg, sag and lbfgs solvers.

   Maximum number of iterations taken for the solvers to converge.

 

   multi_class : str, {'ovr', 'multinomial'}, default: 'ovr'

   Multiclass option can be either 'ovr' or 'multinomial'. If the option

   chosen is 'ovr', then a binary problem is fit for each label. Else

   the loss minimised is the multinomial loss fit across

   the entire probability distribution. Does not work for liblinear

   solver.

 

   .. versionadded:: 0.18

   Stochastic Average Gradient descent solver for 'multinomial' case.

 

   verbose : int, default: 0

   For the liblinear and lbfgs solvers set verbose to any positive

   number for verbosity.

 

   warm_start : bool, default: False

   When set to True, reuse the solution of the previous call to fit as

   initialization, otherwise, just erase the previous solution.

   Useless for liblinear solver.

 

   .. versionadded:: 0.17

   *warm_start* to support *lbfgs*, *newton-cg*, *sag*, *saga* solvers.

 

   n_jobs : int, default: 1

   Number of CPU cores used when parallelizing over classes if

   multi_class='ovr'". This parameter is ignored when the ``solver``is set

   to 'liblinear' regardless of whether 'multi_class' is specified or

   not. If given a value of -1, all cores are used.

 

   Attributes

   ----------

 

   coef_ : array, shape (1, n_features) or (n_classes, n_features)

   Coefficient of the features in the decision function.

 

   `coef_` is of shape (1, n_features) when the given problem

   is binary.

 

   intercept_ : array, shape (1,) or (n_classes,)

   Intercept (a.k.a. bias) added to the decision function.

 

   If `fit_intercept` is set to False, the intercept is set to zero.

   `intercept_` is of shape(1,) when the problem is binary.

 

   n_iter_ : array, shape (n_classes,) or (1, )

   Actual number of iterations for all classes. If binary or multinomial,

   it returns only 1 element. For liblinear solver, only the maximum

   number of iteration across all classes is given.

 

   See also

   --------

   SGDClassifier : incrementally trained logistic regression (when given

   the parameter ``loss="log"``).

   sklearn.svm.LinearSVC : learns SVM models using the same algorithm.

 

   Notes

   -----

   The underlying C implementation uses a random number generator to

   select features when fitting the model. It is thus not uncommon,

   to have slightly different results for the same input data. If

   that happens, try with a smaller tol parameter.

 

   Predict output may not match that of standalone liblinear in certain

   cases. See :ref:`differences from liblinear <liblinear_differences>`

   in the narrative documentation.

 

   References

   ----------

 

   LIBLINEAR -- A Library for Large Linear Classification

  http://www.csie.ntu.edu.tw/~cjlin/liblinear/

 

   SAG -- Mark Schmidt, Nicolas Le Roux, and Francis Bach

   Minimizing Finite Sums with the Stochastic Average Gradient

  https://hal.inria.fr/hal-00860051/document

 

   SAGA -- Defazio, A., Bach F. & Lacoste-Julien S. (2014).

   SAGA: A Fast Incremental Gradient Method With Support

   for Non-Strongly Convex Composite Objectives

  https://arxiv.org/abs/1407.0202

 

   Hsiang-Fu Yu, Fang-Lan Huang, Chih-Jen Lin (2011). Dual coordinate

    descent

   methods for logistic regression and maximum entropy models.

   Machine Learning 85(1-2):41-75.

  http://www.csie.ntu.edu.tw/~cjlin/papers/maxent_dual.pdf

   """

   def __init__(self, penalty='l2', dual=False, tol=1e-4, C=1.0,

       fit_intercept=True, intercept_scaling=1, class_weight=None,

       random_state=None, solver='liblinear', max_iter=100,

       multi_class='ovr', verbose=0, warm_start=False, n_jobs=1):

       self.penalty = penalty

       self.dual = dual

       self.tol = tol

       self.C = C

       self.fit_intercept = fit_intercept

       self.intercept_scaling = intercept_scaling

       self.class_weight = class_weight

       self.random_state = random_state

       self.solver = solver

       self.max_iter = max_iter

       self.multi_class = multi_class

       self.verbose = verbose

       self.warm_start = warm_start

       self.n_jobs = n_jobs

 

   def fit(self, X, y, sample_weight=None):

       """Fit the model according to the given training data.

       Parameters

       ----------

       X : {array-like, sparse matrix}, shape (n_samples, n_features)

           Training vector, where n_samples is the number of samples and

           n_features is the number of features.

       y : array-like, shape (n_samples,)

           Target vector relative to X.

       sample_weight : array-like, shape (n_samples,) optional

           Array of weights that are assigned to individual samples.

           If not provided, then each sample is given unit weight.

           .. versionadded:: 0.17

              *sample_weight* support to LogisticRegression.

       Returns

       -------

       self : object

           Returns self.

       """

       if not isinstance(self.C, numbers.Number) or self.C < 0:

           raise ValueError(

               "Penalty term must be positive; got (C=%r)" % self.C)

       if not isinstance(self.max_iter, numbers.Number) or self.max_iter < 0:

           raise ValueError(

               "Maximum number of iteration must be positive;"

               " got (max_iter=%r)" %

               self.max_iter)

       if not isinstance(self.tol, numbers.Number) or self.tol < 0:

           raise ValueError("Tolerance for stopping criteria must be "

               "positive; got (tol=%r)" %

               self.tol)

       if self.solver in ['newton-cg']:

           _dtype = [np.float64, np.float32]

       else:

           _dtype = np.float64

       X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype,

           order="C")

       check_classification_targets(y)

       self.classes_ = np.unique(y)

       n_samples, n_features = X.shape

       _check_solver_option(self.solver, self.multi_class, self.penalty, self.

        dual)

       if self.solver == 'liblinear':

           if self.n_jobs != 1:

               warnings.warn("'n_jobs' > 1 does not have any effect when"

                   " 'solver' is set to 'liblinear'. Got 'n_jobs'"

                   " = {}.".

                   format(self.n_jobs))

           self.coef_, self.intercept_, n_iter_ = _fit_liblinear(X, y, self.C, self.

            fit_intercept, self.intercept_scaling, self.class_weight, self.penalty, self.

            dual, self.verbose, self.max_iter, self.tol, self.random_state,

               sample_weight=sample_weight)

           self.n_iter_ = np.array([n_iter_])

           return self

       if self.solver in ['sag', 'saga']:

           max_squared_sum = row_norms(X, squared=True).max()

       else:

           max_squared_sum = None

       n_classes = len(self.classes_)

       classes_ = self.classes_

       if n_classes < 2:

           raise ValueError(

               "This solver needs samples of at least 2 classes"

               " in the data, but the data contains only one"

               " class: %r" %

               classes_[0])

       if len(self.classes_) == 2:

           n_classes = 1

           classes_ = classes_[1:]

       if self.warm_start:

           warm_start_coef = getattr(self, 'coef_', None)

       else:

           warm_start_coef = None

       if warm_start_coef is not None and self.fit_intercept:

           warm_start_coef = np.append(warm_start_coef,

               self.intercept_[:np.newaxis],

               axis=1)

       self.coef_ = list()

       self.intercept_ = np.zeros(n_classes)

       # Hack so that we iterate only once for the multinomial case.

       if self.multi_class == 'multinomial':

           classes_ = [None]

           warm_start_coef = [warm_start_coef]

       if warm_start_coef is None:

           warm_start_coef = [None] * n_classes

       path_func = delayed(logistic_regression_path)

       # The SAG solver releases the GIL so it's more efficient to use

       # threads for this solver.

       if self.solver in ['sag', 'saga']:

           backend = 'threading'

       else:

           backend = 'multiprocessing'

       fold_coefs_ = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,

           backend=backend)(

           path_func(X, y, pos_class=class_, Cs=[self.C],

               fit_intercept=self.fit_intercept, tol=self.tol,

               verbose=self.verbose, solver=self.solver,

               multi_class=self.multi_class, max_iter=self.max_iter,

               class_weight=self.class_weight, check_input=False,

               random_state=self.random_state, coef=warm_start_coef_,

               penalty=self.penalty,

               max_squared_sum=max_squared_sum,

               sample_weight=sample_weight) for

           (class_, warm_start_coef_) in zip(classes_, warm_start_coef))

       fold_coefs_, _, n_iter_ = zip(*fold_coefs_)

       self.n_iter_ = np.asarray(n_iter_, dtype=np.int32)[:0]

       if self.multi_class == 'multinomial':

           self.coef_ = fold_coefs_[0][0]

       else:

           self.coef_ = np.asarray(fold_coefs_)

           self.coef_ = self.coef_.reshape(n_classes, n_features +

               int(self.fit_intercept))

       if self.fit_intercept:

           self.intercept_ = self.coef_[:-1]

           self.coef_ = self.coef_[::-1]

       return self

 

   def predict_proba(self, X):

       """Probability estimates.

       The returned estimates for all classes are ordered by the

       label of classes.

       For a multi_class problem, if multi_class is set to be "multinomial"

       the softmax function is used to find the predicted probability of

       each class.

       Else use a one-vs-rest approach, i.e calculate the probability

       of each class assuming it to be positive using the logistic function.

       and normalize these values across all the classes.

       Parameters

       ----------

       X : array-like, shape = [n_samples, n_features]

       Returns

       -------

       T : array-like, shape = [n_samples, n_classes]

           Returns the probability of the sample for each class in the model,

           where classes are ordered as they are in ``self.classes_``.

       """

       if not hasattr(self, "coef_"):

           raise NotFittedError("Call fit before prediction")

       calculate_ovr = self.coef_.shape[0] == 1 or self.multi_class == "ovr"

       if calculate_ovr:

           return super(LogisticRegression, self)._predict_proba_lr(X)

       else:

           return softmax(self.decision_function(X), copy=False)

 

   def predict_log_proba(self, X):

       """Log of probability estimates.

       The returned estimates for all classes are ordered by the

       label of classes.

       Parameters

       ----------

       X : array-like, shape = [n_samples, n_features]

       Returns

       -------

       T : array-like, shape = [n_samples, n_classes]

           Returns the log-probability of the sample for each class in the

           model, where classes are ordered as they are in ``self.classes_``.

       """

       return np.log(self.predict_proba(X))



相关文章
|
6天前
|
人工智能 编解码 算法
DeepSeek加持的通义灵码2.0 AI程序员实战案例:助力嵌入式开发中的算法生成革新
本文介绍了通义灵码2.0 AI程序员在嵌入式开发中的实战应用。通过安装VS Code插件并登录阿里云账号,用户可切换至DeepSeek V3模型,利用其强大的代码生成能力。实战案例中,AI程序员根据自然语言描述快速生成了C语言的base64编解码算法,包括源代码、头文件、测试代码和CMake编译脚本。即使在编译错误和需求迭代的情况下,AI程序员也能迅速分析问题并修复代码,最终成功实现功能。作者认为,通义灵码2.0显著提升了开发效率,打破了编程语言限制,是AI编程从辅助工具向工程级协同开发转变的重要标志,值得开发者广泛使用。
6696 65
DeepSeek加持的通义灵码2.0 AI程序员实战案例:助力嵌入式开发中的算法生成革新
|
2月前
|
机器学习/深度学习 算法 数据可视化
利用SVM(支持向量机)分类算法对鸢尾花数据集进行分类
本文介绍了如何使用支持向量机(SVM)算法对鸢尾花数据集进行分类。作者通过Python的sklearn库加载数据,并利用pandas、matplotlib等工具进行数据分析和可视化。
151 70
|
5月前
|
存储 分布式计算 算法
大数据-106 Spark Graph X 计算学习 案例:1图的基本计算、2连通图算法、3寻找相同的用户
大数据-106 Spark Graph X 计算学习 案例:1图的基本计算、2连通图算法、3寻找相同的用户
102 0
|
5月前
|
存储 算法 搜索推荐
这些算法在实际应用中有哪些具体案例呢
【10月更文挑战第19天】这些算法在实际应用中有哪些具体案例呢
103 1
|
5月前
|
存储 缓存 分布式计算
数据结构与算法学习一:学习前的准备,数据结构的分类,数据结构与算法的关系,实际编程中遇到的问题,几个经典算法问题
这篇文章是关于数据结构与算法的学习指南,涵盖了数据结构的分类、数据结构与算法的关系、实际编程中遇到的问题以及几个经典的算法面试题。
64 0
数据结构与算法学习一:学习前的准备,数据结构的分类,数据结构与算法的关系,实际编程中遇到的问题,几个经典算法问题
|
5月前
|
算法 数据可视化 新制造
Threejs路径规划_基于A*算法案例完整版
这篇文章详细介绍了如何在Three.js中完整实现基于A*算法的路径规划案例,包括网格构建、路径寻找算法的实现以及路径可视化展示等方面的内容。
130 0
Threejs路径规划_基于A*算法案例完整版
|
5月前
|
移动开发 算法 前端开发
前端常用算法全解:特征梳理、复杂度比较、分类解读与示例展示
前端常用算法全解:特征梳理、复杂度比较、分类解读与示例展示
64 0
|
5天前
|
机器学习/深度学习 算法 数据安全/隐私保护
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
本研究基于MATLAB 2022a,使用GRU网络对QAM调制信号进行检测。QAM是一种高效调制技术,广泛应用于现代通信系统。传统方法在复杂环境下性能下降,而GRU通过门控机制有效提取时间序列特征,实现16QAM、32QAM、64QAM、128QAM的准确检测。仿真结果显示,GRU在低SNR下表现优异,且训练速度快,参数少。核心程序包括模型预测、误检率和漏检率计算,并绘制准确率图。
79 65
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
|
10天前
|
算法
基于遗传优化算法的风力机位置布局matlab仿真
本项目基于遗传优化算法(GA)进行风力机位置布局的MATLAB仿真,旨在最大化风场发电效率。使用MATLAB2022A版本运行,核心代码通过迭代选择、交叉、变异等操作优化风力机布局。输出包括优化收敛曲线和最佳布局图。遗传算法模拟生物进化机制,通过初始化、选择、交叉、变异和精英保留等步骤,在复杂约束条件下找到最优布局方案,提升风场整体能源产出效率。
|
10天前
|
算法 安全 机器人
基于包围盒的机械臂防碰撞算法matlab仿真
基于包围盒的机械臂防碰撞算法通过构建包围盒来近似表示机械臂及其环境中各实体的空间占用,检测包围盒是否相交以预判并规避潜在碰撞风险。该算法适用于复杂结构对象,通过细分目标对象并逐级检测,确保操作安全。系统采用MATLAB2022a开发,仿真结果显示其有效性。此技术广泛应用于机器人运动规划与控制领域,确保机器人在复杂环境中的安全作业。

热门文章

最新文章