ML之LassoR&RidgeR：基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测实现

基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测

设计思路

输出结果

1. .. _diabetes_dataset:
2. 
3. Diabetes dataset
4. ----------------
5. 
6. Ten baseline variables, age, sex, body mass index, average blood
7. pressure, and six blood serum measurements were obtained for each of n =
8. 442 diabetes patients, as well as the response of interest, a
9. quantitative measure of disease progression one year after baseline.
10. 
11. **Data Set Characteristics:**
12. 
13.   :Number of Instances: 442
14. 
15.   :Number of Attributes: First 10 columns are numeric predictive values
16. 
17.   :Target: Column 11 is a quantitative measure of disease progression one year after baseline
18. 
19.   :Attribute Information:
20.       - age     age in years
21.       - sex
22.       - bmi     body mass index
23.       - bp      average blood pressure
24.       - s1      tc, T-Cells (a type of white blood cells)
25.       - s2      ldl, low-density lipoproteins
26.       - s3      hdl, high-density lipoproteins
27.       - s4      tch, thyroid stimulating hormone
28.       - s5      ltg, lamotrigine
29.       - s6      glu, blood sugar level
30. 
31. Note: Each of these 10 feature variables have been mean centered and scaled by the standard deviation times `n_samples` (i.e. the sum of squares of each column totals 1).
32. 
33. Source URL:
34. https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html
35. 
36. For more information see:
37. Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499.
38. (https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)
39.         age       sex       bmi        bp  ...        s4        s5        s6  target
40. 0  0.038076  0.050680  0.061696  0.021872  ... -0.002592  0.019908 -0.017646   151.0
41. 1 -0.001882 -0.044642 -0.051474 -0.026328  ... -0.039493 -0.068330 -0.092204    75.0
42. 2  0.085299  0.050680  0.044451 -0.005671  ... -0.002592  0.002864 -0.025930   141.0
43. 3 -0.089063 -0.044642 -0.011595 -0.036656  ...  0.034309  0.022692 -0.009362   206.0
44. 4  0.005383 -0.044642 -0.036385  0.021872  ... -0.002592 -0.031991 -0.046641   135.0
45. 
46. [5 rows x 11 columns]
47. alphas:  50 [1.00000000e-03 1.16779862e-03 1.36375363e-03 1.59258961e-03
48.  1.85982395e-03 2.17189985e-03 2.53634166e-03 2.96193630e-03
49.  3.45894513e-03 4.03935136e-03 4.71714896e-03 5.50868007e-03
50.  6.43302900e-03 7.51248241e-03 8.77306662e-03 1.02451751e-02
51.  1.19643014e-02 1.39718947e-02 1.63163594e-02 1.90542221e-02
52.  2.22514943e-02 2.59852645e-02 3.03455561e-02 3.54374986e-02
53.  4.13838621e-02 4.83280172e-02 5.64373920e-02 6.59075087e-02
54.  7.69666979e-02 8.98816039e-02 1.04963613e-01 1.22576363e-01
55.  1.43144508e-01 1.67163960e-01 1.95213842e-01 2.27970456e-01
56.  2.66223585e-01 3.10895536e-01 3.63063379e-01 4.23984915e-01
57.  4.95129000e-01 5.78210965e-01 6.75233969e-01 7.88537299e-01
58.  9.20852773e-01 1.07537060e+00 1.25581631e+00 1.46654056e+00
59.  1.71262404e+00 2.00000000e+00]
60. {'alpha': 0.07696669794067007}
61. 0.472 (+/-0.177) for {'alpha': 0.0010000000000000002}
62. 0.472 (+/-0.177) for {'alpha': 0.0011677986237376523}
63. 0.472 (+/-0.177) for {'alpha': 0.0013637536256035543}
64. 0.472 (+/-0.177) for {'alpha': 0.0015925896070970653}
65. 0.472 (+/-0.177) for {'alpha': 0.0018598239513468405}
66. 0.472 (+/-0.176) for {'alpha': 0.002171899850777162}
67. 0.472 (+/-0.176) for {'alpha': 0.0025363416566335814}
68. 0.472 (+/-0.176) for {'alpha': 0.002961936295945173}
69. 0.472 (+/-0.176) for {'alpha': 0.0034589451300033745}
70. 0.472 (+/-0.176) for {'alpha': 0.004039351362401994}
71. 0.472 (+/-0.176) for {'alpha': 0.004717148961805858}
72. 0.472 (+/-0.176) for {'alpha': 0.0055086800655623795}
73. 0.472 (+/-0.176) for {'alpha': 0.00643302899917478}
74. 0.472 (+/-0.176) for {'alpha': 0.007512482411700719}
75. 0.471 (+/-0.175) for {'alpha': 0.008773066621237415}
76. 0.471 (+/-0.174) for {'alpha': 0.010245175126239786}
77. 0.471 (+/-0.174) for {'alpha': 0.011964301412374057}
78. 0.472 (+/-0.173) for {'alpha': 0.013971894723352857}
79. 0.472 (+/-0.173) for {'alpha': 0.016316359428938842}
80. 0.473 (+/-0.172) for {'alpha': 0.01905422208552364}
81. 0.473 (+/-0.171) for {'alpha': 0.02225149432786609}
82. 0.474 (+/-0.170) for {'alpha': 0.025985264452188187}
83. 0.474 (+/-0.169) for {'alpha': 0.03034555606472431}
84. 0.475 (+/-0.167) for {'alpha': 0.03543749860893881}
85. 0.476 (+/-0.166) for {'alpha': 0.04138386210422369}
86. 0.477 (+/-0.164) for {'alpha': 0.04832801721026122}
87. 0.477 (+/-0.162) for {'alpha': 0.05643739198611263}
88. 0.478 (+/-0.160) for {'alpha': 0.06590750868872472}
89. 0.478 (+/-0.157) for {'alpha': 0.07696669794067007}
90. 0.477 (+/-0.154) for {'alpha': 0.08988160392874607}
91. 0.476 (+/-0.151) for {'alpha': 0.10496361336732249}
92. 0.475 (+/-0.148) for {'alpha': 0.12257636323289021}
93. 0.472 (+/-0.145) for {'alpha': 0.1431445082861357}
94. 0.469 (+/-0.143) for {'alpha': 0.1671639597721522}
95. 0.466 (+/-0.139) for {'alpha': 0.19521384216045554}
96. 0.462 (+/-0.134) for {'alpha': 0.22797045620951942}
97. 0.455 (+/-0.129) for {'alpha': 0.2662235850143214}
98. 0.447 (+/-0.126) for {'alpha': 0.31089553618622834}
99. 0.441 (+/-0.122) for {'alpha': 0.3630633792844568}
100. 0.434 (+/-0.117) for {'alpha': 0.42398491465793015}
101. 0.425 (+/-0.112) for {'alpha': 0.49512899982305664}
102. 0.412 (+/-0.107) for {'alpha': 0.5782109645659657}
103. 0.395 (+/-0.105) for {'alpha': 0.675233968650155}
104. 0.374 (+/-0.103) for {'alpha': 0.7885372992905638}
105. 0.348 (+/-0.102) for {'alpha': 0.9208527728773261}
106. 0.314 (+/-0.095) for {'alpha': 1.075370600831142}
107. 0.272 (+/-0.089) for {'alpha': 1.2558163076585396}
108. 0.212 (+/-0.084) for {'alpha': 1.466540555750942}
109. 0.129 (+/-0.087) for {'alpha': 1.7126240426614014}
110. 0.018 (+/-0.098) for {'alpha': 2.0}
111. m_log_alphas:  100 [-0.77418297 -0.70440767 -0.63463236 -0.56485706 -0.49508175 -0.42530644
112.  -0.35553114 -0.28575583 -0.21598053 -0.14620522 -0.07642991 -0.00665461
113.   0.0631207   0.132896    0.20267131  0.27244662  0.34222192  0.41199723
114.   0.48177253  0.55154784  0.62132314  0.69109845  0.76087376  0.83064906
115.   0.90042437  0.97019967  1.03997498  1.10975029  1.17952559  1.2493009
116.   1.3190762   1.38885151  1.45862681  1.52840212  1.59817743  1.66795273
117.   1.73772804  1.80750334  1.87727865  1.94705396  2.01682926  2.08660457
118.   2.15637987  2.22615518  2.29593048  2.36570579  2.4354811   2.5052564
119.   2.57503171  2.64480701  2.71458232  2.78435763  2.85413293  2.92390824
120.   2.99368354  3.06345885  3.13323415  3.20300946  3.27278477  3.34256007
121.   3.41233538  3.48211068  3.55188599  3.6216613   3.6914366   3.76121191
122.   3.83098721  3.90076252  3.97053783  4.04031313  4.11008844  4.17986374
123.   4.24963905  4.31941435  4.38918966  4.45896497  4.52874027  4.59851558
124.   4.66829088  4.73806619  4.8078415   4.8776168   4.94739211  5.01716741
125.   5.08694272  5.15671802  5.22649333  5.29626864  5.36604394  5.43581925
126.   5.50559455  5.57536986  5.64514517  5.71492047  5.78469578  5.85447108
127.   5.92424639  5.99402169  6.063797    6.13357231]
128. 交叉验证选择的alpha: 0.06176875494949271
129. (100, 10) (100,)
130. alphas:  50 [1.00000000e-04 1.22398508e-04 1.49813947e-04 1.83370035e-04
131.  2.24442186e-04 2.74713886e-04 3.36245696e-04 4.11559714e-04
132.  5.03742947e-04 6.16573850e-04 7.54677190e-04 9.23713617e-04
133.  1.13061168e-03 1.38385182e-03 1.69381398e-03 2.07320303e-03
134.  2.53756957e-03 3.10594728e-03 3.80163312e-03 4.65314220e-03
135.  5.69537661e-03 6.97105597e-03 8.53246847e-03 1.04436141e-02
136.  1.27828277e-02 1.56459904e-02 1.91504587e-02 2.34398757e-02
137.  2.86900580e-02 3.51162028e-02 4.29817081e-02 5.26089693e-02
138.  6.43925932e-02 7.88155731e-02 9.64690852e-02 1.18076721e-01
139.  1.44524144e-01 1.76895395e-01 2.16517323e-01 2.65013972e-01
140.  3.24373147e-01 3.97027891e-01 4.85956213e-01 5.94803152e-01
141.  7.28030181e-01 8.91098077e-01 1.09069075e+00 1.33498920e+00
142.  1.63400685e+00 2.00000000e+00]
143. m_log_alphas:  50 [ 9.21034037  9.00822838  8.80611639  8.6040044   8.40189241  8.19978042
144.   7.99766843  7.79555644  7.59344445  7.39133245  7.18922046  6.98710847
145.   6.78499648  6.58288449  6.3807725   6.17866051  5.97654852  5.77443653
146.   5.57232454  5.37021255  5.16810055  4.96598856  4.76387657  4.56176458
147.   4.35965259  4.1575406   3.95542861  3.75331662  3.55120463  3.34909264
148.   3.14698065  2.94486866  2.74275666  2.54064467  2.33853268  2.13642069
149.   1.9343087   1.73219671  1.53008472  1.32797273  1.12586074  0.92374875
150.   0.72163676  0.51952476  0.31741277  0.11530078 -0.08681121 -0.2889232
151.  -0.49103519 -0.69314718]
152. 交叉验证选择的alpha: 0.0046531422008170295

核心代码

1. class Ridge Found at: sklearn.linear_model._ridge
2. 
3. class Ridge(MultiOutputMixin, RegressorMixin, _BaseRidge):
4. """Linear least squares with l2 regularization.
5.     
6.     Minimizes the objective function::
7.     
8.     ||y - Xw||^2_2 + alpha * ||w||^2_2
9.     
10.     This model solves a regression model where the loss function is
11.     the linear least squares function and regularization is given by
12.     the l2-norm. Also known as Ridge Regression or Tikhonov regularization.
13.     This estimator has built-in support for multi-variate regression
14.     (i.e., when y is a 2d-array of shape (n_samples, n_targets)).
15.     
16.     Read more in the :ref:`User Guide <ridge_regression>`.
17.     
18.     Parameters
19.     ----------
20.     alpha : {float, ndarray of shape (n_targets,)}, default=1.0
21.     Regularization strength; must be a positive float. Regularization
22.     improves the conditioning of the problem and reduces the variance of
23.     the estimates. Larger values specify stronger regularization.
24.     Alpha corresponds to ``1 / (2C)`` in other linear models such as
25.     :class:`~sklearn.linear_model.LogisticRegression` or
26.     :class:`sklearn.svm.LinearSVC`. If an array is passed, penalties are
27.     assumed to be specific to the targets. Hence they must correspond in
28.     number.
29.     
30.     fit_intercept : bool, default=True
31.     Whether to fit the intercept for this model. If set
32.     to false, no intercept will be used in calculations
33.     (i.e. ``X`` and ``y`` are expected to be centered).
34.     
35.     normalize : bool, default=False
36.     This parameter is ignored when ``fit_intercept`` is set to False.
37.     If True, the regressors X will be normalized before regression by
38.     subtracting the mean and dividing by the l2-norm.
39.     If you wish to standardize, please use
40.     :class:`sklearn.preprocessing.StandardScaler` before calling ``fit``
41.     on an estimator with ``normalize=False``.
42.     
43.     copy_X : bool, default=True
44.     If True, X will be copied; else, it may be overwritten.
45.     
46.     max_iter : int, default=None
47.     Maximum number of iterations for conjugate gradient solver.
48.     For 'sparse_cg' and 'lsqr' solvers, the default value is determined
49.     by scipy.sparse.linalg. For 'sag' solver, the default value is 1000.
50.     
51.     tol : float, default=1e-3
52.     Precision of the solution.
53.     
54.     solver : {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga'}, \
55.     default='auto'
56.     Solver to use in the computational routines:
57.     
58.     - 'auto' chooses the solver automatically based on the type of data.
59.     
60.     - 'svd' uses a Singular Value Decomposition of X to compute the Ridge
61.     coefficients. More stable for singular matrices than 'cholesky'.
62.     
63.     - 'cholesky' uses the standard scipy.linalg.solve function to
64.     obtain a closed-form solution.
65.     
66.     - 'sparse_cg' uses the conjugate gradient solver as found in
67.     scipy.sparse.linalg.cg. As an iterative algorithm, this solver is
68.     more appropriate than 'cholesky' for large-scale data
69.     (possibility to set `tol` and `max_iter`).
70.     
71.     - 'lsqr' uses the dedicated regularized least-squares routine
72.     scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative
73.     procedure.
74.     
75.     - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses
76.     its improved, unbiased version named SAGA. Both methods also use an
77.     iterative procedure, and are often faster than other solvers when
78.     both n_samples and n_features are large. Note that 'sag' and
79.     'saga' fast convergence is only guaranteed on features with
80.     approximately the same scale. You can preprocess the data with a
81.     scaler from sklearn.preprocessing.
82.     
83.     All last five solvers support both dense and sparse data. However, only
84.     'sag' and 'sparse_cg' supports sparse input when `fit_intercept` is
85.     True.
86.     
87.     .. versionadded:: 0.17
88.     Stochastic Average Gradient descent solver.
89.     .. versionadded:: 0.19
90.     SAGA solver.
91.     
92.     random_state : int, RandomState instance, default=None
93.     Used when ``solver`` == 'sag' or 'saga' to shuffle the data.
94.     See :term:`Glossary <random_state>` for details.
95.     
96.     .. versionadded:: 0.17
97.     `random_state` to support Stochastic Average Gradient.
98.     
99.     Attributes
100.     ----------
101.     coef_ : ndarray of shape (n_features,) or (n_targets, n_features)
102.     Weight vector(s).
103.     
104.     intercept_ : float or ndarray of shape (n_targets,)
105.     Independent term in decision function. Set to 0.0 if
106.     ``fit_intercept = False``.
107.     
108.     n_iter_ : None or ndarray of shape (n_targets,)
109.     Actual number of iterations for each target. Available only for
110.     sag and lsqr solvers. Other solvers will return None.
111.     
112.     .. versionadded:: 0.17
113.     
114.     See also
115.     --------
116.     RidgeClassifier : Ridge classifier
117.     RidgeCV : Ridge regression with built-in cross validation
118.     :class:`sklearn.kernel_ridge.KernelRidge` : Kernel ridge regression
119.     combines ridge regression with the kernel trick
120.     
121.     Examples
122.     --------
123.     >>> from sklearn.linear_model import Ridge
124.     >>> import numpy as np
125.     >>> n_samples, n_features = 10, 5
126.     >>> rng = np.random.RandomState(0)
127.     >>> y = rng.randn(n_samples)
128.     >>> X = rng.randn(n_samples, n_features)
129.     >>> clf = Ridge(alpha=1.0)
130.     >>> clf.fit(X, y)
131.     Ridge()
132.     """
133.     @_deprecate_positional_args
134. def __init__(self, alpha=1.0, *, fit_intercept=True, normalize=False, 
135.         copy_X=True, max_iter=None, tol=1e-3, solver="auto", 
136.         random_state=None):
137. super().__init__(alpha=alpha, fit_intercept=fit_intercept, 
138.          normalize=normalize, copy_X=copy_X, max_iter=max_iter, tol=tol, 
139.          solver=solver, random_state=random_state)
140. 
141. def fit(self, X, y, sample_weight=None):
142. """Fit Ridge regression model.
143. 
144.         Parameters
145.         ----------
146.         X : {ndarray, sparse matrix} of shape (n_samples, n_features)
147.             Training data
148. 
149.         y : ndarray of shape (n_samples,) or (n_samples, n_targets)
150.             Target values
151. 
152.         sample_weight : float or ndarray of shape (n_samples,), default=None
153.             Individual weights for each sample. If given a float, every sample
154.             will have the same weight.
155. 
156.         Returns
157.         -------
158.         self : returns an instance of self.
159.         """
160. return super().fit(X, y, sample_weight=sample_weight)

ML之LassoR&RidgeR：基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测

基于datasets糖尿病数据集利用LassoR和RidgeR算法(alpha调参)进行(9→1)回归预测

设计思路

输出结果

核心代码

热门文章

最新文章

相关课程

相关电子书

相关实验场景