基于人工智能的【患肺癌病】风险预测与分析(下)

简介: 基于人工智能的【患肺癌病】风险预测与分析

三、模型训练与评估


1.数据集划分


from sklearn.model_selection import train_test_split,cross_val_score
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=2023)


2.数据标准化


  • 返回值为标准化后的数据
  • 加载了 StandardScaler 类,并初始化了 StandardScaler 对象 scaler,使用 fit 方法,StandardScaler 从训练数据中估计每个特征维度的参数 μ (样本均值)和  σ (标准差)。 通过调用 transform 方法,使用估计的参数 μ 和 σ 对训练和测试数据进行标准化。


from sklearn.preprocessing import StandardScaler
help(StandardScaler)


Help on class StandardScaler in module sklearn.preprocessing._data:
class StandardScaler(sklearn.base._OneToOneFeatureMixin, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator)
 |  StandardScaler(*, copy=True, with_mean=True, with_std=True)
 |  
 |  Standardize features by removing the mean and scaling to unit variance.
 |  
 |  The standard score of a sample `x` is calculated as:
 |  
 |      z = (x - u) / s
 |  
 |  where `u` is the mean of the training samples or zero if `with_mean=False`,
 |  and `s` is the standard deviation of the training samples or one if
 |  `with_std=False`.
 |  
 |  Centering and scaling happen independently on each feature by computing
 |  the relevant statistics on the samples in the training set. Mean and
 |  standard deviation are then stored to be used on later data using
 |  :meth:`transform`.
 |  
 |  Standardization of a dataset is a common requirement for many
 |  machine learning estimators: they might behave badly if the
 |  individual features do not more or less look like standard normally
 |  distributed data (e.g. Gaussian with 0 mean and unit variance).
 |  
 |  For instance many elements used in the objective function of
 |  a learning algorithm (such as the RBF kernel of Support Vector
 |  Machines or the L1 and L2 regularizers of linear models) assume that
 |  all features are centered around 0 and have variance in the same
 |  order. If a feature has a variance that is orders of magnitude larger
 |  that others, it might dominate the objective function and make the
 |  estimator unable to learn from other features correctly as expected.
 |  
 |  This scaler can also be applied to sparse CSR or CSC matrices by passing
 |  `with_mean=False` to avoid breaking the sparsity structure of the data.
 |  
 |  Read more in the :ref:`User Guide <preprocessing_scaler>`.
 |  
 |  Parameters
 |  ----------
 |  copy : bool, default=True
 |      If False, try to avoid a copy and do inplace scaling instead.
 |      This is not guaranteed to always work inplace; e.g. if the data is
 |      not a NumPy array or scipy.sparse CSR matrix, a copy may still be
 |      returned.
 |  
 |  with_mean : bool, default=True
 |      If True, center the data before scaling.
 |      This does not work (and will raise an exception) when attempted on
 |      sparse matrices, because centering them entails building a dense
 |      matrix which in common use cases is likely to be too large to fit in
 |      memory.
 |  
 |  with_std : bool, default=True
 |      If True, scale the data to unit variance (or equivalently,
 |      unit standard deviation).
 |  
 |  Attributes
 |  ----------
 |  scale_ : ndarray of shape (n_features,) or None
 |      Per feature relative scaling of the data to achieve zero mean and unit
 |      variance. Generally this is calculated using `np.sqrt(var_)`. If a
 |      variance is zero, we can't achieve unit variance, and the data is left
 |      as-is, giving a scaling factor of 1. `scale_` is equal to `None`
 |      when `with_std=False`.
 |  
 |      .. versionadded:: 0.17
 |         *scale_*
 |  
 |  mean_ : ndarray of shape (n_features,) or None
 |      The mean value for each feature in the training set.
 |      Equal to ``None`` when ``with_mean=False``.
 |  
 |  var_ : ndarray of shape (n_features,) or None
 |      The variance for each feature in the training set. Used to compute
 |      `scale_`. Equal to ``None`` when ``with_std=False``.
 |  
 |  n_features_in_ : int
 |      Number of features seen during :term:`fit`.
 |  
 |      .. versionadded:: 0.24
 |  
 |  feature_names_in_ : ndarray of shape (`n_features_in_`,)
 |      Names of features seen during :term:`fit`. Defined only when `X`
 |      has feature names that are all strings.
 |  
 |      .. versionadded:: 1.0
 |  
 |  n_samples_seen_ : int or ndarray of shape (n_features,)
 |      The number of samples processed by the estimator for each feature.
 |      If there are no missing samples, the ``n_samples_seen`` will be an
 |      integer, otherwise it will be an array of dtype int. If
 |      `sample_weights` are used it will be a float (if no missing data)
 |      or an array of dtype float that sums the weights seen so far.
 |      Will be reset on new calls to fit, but increments across
 |      ``partial_fit`` calls.
 |  
 |  See Also
 |  --------
 |  scale : Equivalent function without the estimator API.
 |  
 |  :class:`~sklearn.decomposition.PCA` : Further removes the linear
 |      correlation across features with 'whiten=True'.
 |  
 |  Notes
 |  -----
 |  NaNs are treated as missing values: disregarded in fit, and maintained in
 |  transform.
 |  
 |  We use a biased estimator for the standard deviation, equivalent to
 |  `numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to
 |  affect model performance.
 |  
 |  For a comparison of the different scalers, transformers, and normalizers,
 |  see :ref:`examples/preprocessing/plot_all_scaling.py
 |  <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.
 |  
 |  Examples
 |  --------
 |  >>> from sklearn.preprocessing import StandardScaler
 |  >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
 |  >>> scaler = StandardScaler()
 |  >>> print(scaler.fit(data))
 |  StandardScaler()
 |  >>> print(scaler.mean_)
 |  [0.5 0.5]
 |  >>> print(scaler.transform(data))
 |  [[-1. -1.]
 |   [-1. -1.]
 |   [ 1.  1.]
 |   [ 1.  1.]]
 |  >>> print(scaler.transform([[2, 2]]))
 |  [[3. 3.]]
 |  
 |  Method resolution order:
 |      StandardScaler
 |      sklearn.base._OneToOneFeatureMixin
 |      sklearn.base.TransformerMixin
 |      sklearn.base.BaseEstimator
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, *, copy=True, with_mean=True, with_std=True)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  fit(self, X, y=None, sample_weight=None)
 |      Compute the mean and std to be used for later scaling.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to compute the mean and standard deviation
 |          used for later scaling along the features axis.
 |      
 |      y : None
 |          Ignored.
 |      
 |      sample_weight : array-like of shape (n_samples,), default=None
 |          Individual weights for each sample.
 |      
 |          .. versionadded:: 0.24
 |             parameter *sample_weight* support to StandardScaler.
 |      
 |      Returns
 |      -------
 |      self : object
 |          Fitted scaler.
 |  
 |  inverse_transform(self, X, copy=None)
 |      Scale back the data to the original representation.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to scale along the features axis.
 |      copy : bool, default=None
 |          Copy the input X or not.
 |      
 |      Returns
 |      -------
 |      X_tr : {ndarray, sparse matrix} of shape (n_samples, n_features)
 |          Transformed array.
 |  
 |  partial_fit(self, X, y=None, sample_weight=None)
 |      Online computation of mean and std on X for later scaling.
 |      
 |      All of X is processed as a single batch. This is intended for cases
 |      when :meth:`fit` is not feasible due to very large number of
 |      `n_samples` or because X is read from a continuous stream.
 |      
 |      The algorithm for incremental mean and std is given in Equation 1.5a,b
 |      in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. "Algorithms
 |      for computing the sample variance: Analysis and recommendations."
 |      The American Statistician 37.3 (1983): 242-247:
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to compute the mean and standard deviation
 |          used for later scaling along the features axis.
 |      
 |      y : None
 |          Ignored.
 |      
 |      sample_weight : array-like of shape (n_samples,), default=None
 |          Individual weights for each sample.
 |      
 |          .. versionadded:: 0.24
 |             parameter *sample_weight* support to StandardScaler.
 |      
 |      Returns
 |      -------
 |      self : object
 |          Fitted scaler.
 |  
 |  transform(self, X, copy=None)
 |      Perform standardization by centering and scaling.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix of shape (n_samples, n_features)
 |          The data used to scale along the features axis.
 |      copy : bool, default=None
 |          Copy the input X or not.
 |      
 |      Returns
 |      -------
 |      X_tr : {ndarray, sparse matrix} of shape (n_samples, n_features)
 |          Transformed array.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base._OneToOneFeatureMixin:
 |  
 |  get_feature_names_out(self, input_features=None)
 |      Get output feature names for transformation.
 |      
 |      Parameters
 |      ----------
 |      input_features : array-like of str or None, default=None
 |          Input features.
 |      
 |          - If `input_features` is `None`, then `feature_names_in_` is
 |            used as feature names in. If `feature_names_in_` is not defined,
 |            then names are generated: `[x0, x1, ..., x(n_features_in_)]`.
 |          - If `input_features` is an array-like, then `input_features` must
 |            match `feature_names_in_` if `feature_names_in_` is defined.
 |      
 |      Returns
 |      -------
 |      feature_names_out : ndarray of str objects
 |          Same as input features.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from sklearn.base._OneToOneFeatureMixin:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.TransformerMixin:
 |  
 |  fit_transform(self, X, y=None, **fit_params)
 |      Fit to data, then transform it.
 |      
 |      Fits transformer to `X` and `y` with optional parameters `fit_params`
 |      and returns a transformed version of `X`.
 |      
 |      Parameters
 |      ----------
 |      X : array-like of shape (n_samples, n_features)
 |          Input samples.
 |      
 |      y :  array-like of shape (n_samples,) or (n_samples, n_outputs),                 default=None
 |          Target values (None for unsupervised transformations).
 |      
 |      **fit_params : dict
 |          Additional fit parameters.
 |      
 |      Returns
 |      -------
 |      X_new : ndarray array of shape (n_samples, n_features_new)
 |          Transformed array.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.BaseEstimator:
 |  
 |  __getstate__(self)
 |  
 |  __repr__(self, N_CHAR_MAX=700)
 |      Return repr(self).
 |  
 |  __setstate__(self, state)
 |  
 |  get_params(self, deep=True)
 |      Get parameters for this estimator.
 |      
 |      Parameters
 |      ----------
 |      deep : bool, default=True
 |          If True, will return the parameters for this estimator and
 |          contained subobjects that are estimators.
 |      
 |      Returns
 |      -------
 |      params : dict
 |          Parameter names mapped to their values.
 |  
 |  set_params(self, **params)
 |      Set the parameters of this estimator.
 |      
 |      The method works on simple estimators as well as on nested objects
 |      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
 |      parameters of the form ``<component>__<parameter>`` so that it's
 |      possible to update each component of a nested object.
 |      
 |      Parameters
 |      ----------
 |      **params : dict
 |          Estimator parameters.
 |      
 |      Returns
 |      -------
 |      self : estimator instance
 |          Estimator instance.


scaler=StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)


print(X_train[0])


[-0.7710306   1.41036889  1.08508956  1.25031642  1.39864376  1.39096463 -0.72288062  0.93078432 -0.70710678  1.36833491 -0.73479518  1.39096463  0.88551735  1.53202723 -0.72288062]


3.随机森林训练


from sklearn.ensemble import RandomForestClassifier
rf=RandomForestClassifier()
rf.fit(X_train,y_train)
y_prdrf=rf.predict(X_test)


4.模型评估


from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,y_prdrf))
cvs_rf=round(cross_val_score(rf,X,y,scoring="accuracy",cv=10).mean(),2)
print("Cross validation score for Random Forest Classifier model is:",cvs_rf)


precision    recall  f1-score   support
           0       0.95      0.99      0.97        79
           1       0.98      0.93      0.95        56
    accuracy                           0.96       135
   macro avg       0.97      0.96      0.96       135
weighted avg       0.96      0.96      0.96       135
Cross validation score for Random Forest Classifier model is: 0.96


5.绘制混淆矩阵


sns.heatmap(confusion_matrix(y_test,y_prdrf),annot=True,cmap='viridis')
plt.xlabel("Predicted")
plt.ylabel("Truth")
plt.title("Confusion matrix- Random Forest Classifier")


Text(0.5,1,'Confusion matrix- Random Forest Classifier')

image.png

可以看出还是相当准确的。



目录
相关文章
|
1月前
|
机器学习/深度学习 人工智能 搜索推荐
探索人工智能在医疗影像分析中的应用
随着人工智能技术的飞速发展,其在医疗领域的应用日益增多,特别是在医疗影像分析方面。本文将深入探讨人工智能技术在医疗影像分析中的关键作用,包括图像识别、模式分析和深度学习等先进技术的运用。同时,文中还将讨论这些技术在提高诊断准确性、降低工作负荷以及促进个性化治疗等方面的贡献。通过案例研究和最新研究成果的展示,本文旨在为读者提供一个关于人工智能如何改变医疗影像分析领域的全面视角。
|
1月前
|
机器学习/深度学习 人工智能 搜索推荐
未来十年人工智能在医疗行业的应用前景分析
随着人工智能技术的不断发展,医疗行业也将迎来巨大的变革与机遇。本文从人工智能在医疗诊断、药物研发、个性化治疗等方面的应用现状入手,探讨了未来十年人工智能在医疗领域的发展趋势及挑战。
|
7月前
|
人工智能 安全 机器人
人工智能是否有风险
人工智能是否有风险
43 0
|
3月前
|
人工智能 安全
人工智能大模型井喷后需防风险
【1月更文挑战第21天】人工智能大模型井喷后需防风险
162 6
人工智能大模型井喷后需防风险
|
3月前
|
机器学习/深度学习 人工智能 监控
人工智能在内网上网行为管理软件中的智能分析与优化
随着科技的迅猛发展,内网上网行为管理软件越来越成为企业信息安全的重要组成部分。本文将探讨如何通过人工智能技术对内网上网行为进行智能分析与优化,以提高管理软件的效能。
328 0
|
4月前
|
机器学习/深度学习 人工智能 自然语言处理
Python在人工智能领域的应用案例分析
一、引言 随着人工智能技术的飞速发展,Python作为一种功能强大、易学易用的编程语言,在人工智能领域发挥着越来越重要的作用。本文将介绍Python在人工智能领域的应用案例,包括机器学习、深度学习、自然语言处理等方面,帮助读者了解Python在人工智能领域的实际应用和优势。
|
8月前
|
人工智能 搜索推荐 机器人
人工智能大模型未来发展和机遇,具体案列分析
人工智能大模型未来发展和机遇,具体案列分析
|
8月前
|
人工智能 自然语言处理 自动驾驶
人工智能大模型未来发展和机遇,具体案列分析
人工智能大模型未来发展和机遇,具体案列分析
143 0
|
10月前
|
机器学习/深度学习 存储 人工智能
人工智能直播的趋势分析报告
人工智能直播是指通过人工智能技术来模拟真人直播,通过机器学习和自然语言处理等技术实现。随着人工智能技术的不断发展,人工智能直播在近年来得到了广泛应用。
211 0
|
10月前
|
机器学习/深度学习 人工智能 数据可视化
人工智能创新挑战赛:助力精准气象和海洋预测Baseline[2]:数据探索性分析(温度风场可视化)、CNN+LSTM模型建模
人工智能创新挑战赛:助力精准气象和海洋预测Baseline[2]:数据探索性分析(温度风场可视化)、CNN+LSTM模型建模
人工智能创新挑战赛:助力精准气象和海洋预测Baseline[2]:数据探索性分析(温度风场可视化)、CNN+LSTM模型建模

热门文章

最新文章