基于人工智能的【患肺癌病】风险预测与分析(下)

简介: 基于人工智能的【患肺癌病】风险预测与分析

三、模型训练与评估


1.数据集划分


from sklearn.model_selection import train_test_split,cross_val_score
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=2023)


2.数据标准化


  • 返回值为标准化后的数据
  • 加载了 StandardScaler 类,并初始化了 StandardScaler 对象 scaler,使用 fit 方法,StandardScaler 从训练数据中估计每个特征维度的参数 μ (样本均值)和  σ (标准差)。 通过调用 transform 方法,使用估计的参数 μ 和 σ 对训练和测试数据进行标准化。


from sklearn.preprocessing import StandardScaler
help(StandardScaler)


Help on class StandardScaler in module sklearn.preprocessing._data:
class StandardScaler(sklearn.base._OneToOneFeatureMixin, sklearn.base.TransformerMixin, sklearn.base.BaseEstimator)
 |  StandardScaler(*, copy=True, with_mean=True, with_std=True)
 |  
 |  Standardize features by removing the mean and scaling to unit variance.
 |  
 |  The standard score of a sample `x` is calculated as:
 |  
 |      z = (x - u) / s
 |  
 |  where `u` is the mean of the training samples or zero if `with_mean=False`,
 |  and `s` is the standard deviation of the training samples or one if
 |  `with_std=False`.
 |  
 |  Centering and scaling happen independently on each feature by computing
 |  the relevant statistics on the samples in the training set. Mean and
 |  standard deviation are then stored to be used on later data using
 |  :meth:`transform`.
 |  
 |  Standardization of a dataset is a common requirement for many
 |  machine learning estimators: they might behave badly if the
 |  individual features do not more or less look like standard normally
 |  distributed data (e.g. Gaussian with 0 mean and unit variance).
 |  
 |  For instance many elements used in the objective function of
 |  a learning algorithm (such as the RBF kernel of Support Vector
 |  Machines or the L1 and L2 regularizers of linear models) assume that
 |  all features are centered around 0 and have variance in the same
 |  order. If a feature has a variance that is orders of magnitude larger
 |  that others, it might dominate the objective function and make the
 |  estimator unable to learn from other features correctly as expected.
 |  
 |  This scaler can also be applied to sparse CSR or CSC matrices by passing
 |  `with_mean=False` to avoid breaking the sparsity structure of the data.
 |  
 |  Read more in the :ref:`User Guide <preprocessing_scaler>`.
 |  
 |  Parameters
 |  ----------
 |  copy : bool, default=True
 |      If False, try to avoid a copy and do inplace scaling instead.
 |      This is not guaranteed to always work inplace; e.g. if the data is
 |      not a NumPy array or scipy.sparse CSR matrix, a copy may still be
 |      returned.
 |  
 |  with_mean : bool, default=True
 |      If True, center the data before scaling.
 |      This does not work (and will raise an exception) when attempted on
 |      sparse matrices, because centering them entails building a dense
 |      matrix which in common use cases is likely to be too large to fit in
 |      memory.
 |  
 |  with_std : bool, default=True
 |      If True, scale the data to unit variance (or equivalently,
 |      unit standard deviation).
 |  
 |  Attributes
 |  ----------
 |  scale_ : ndarray of shape (n_features,) or None
 |      Per feature relative scaling of the data to achieve zero mean and unit
 |      variance. Generally this is calculated using `np.sqrt(var_)`. If a
 |      variance is zero, we can't achieve unit variance, and the data is left
 |      as-is, giving a scaling factor of 1. `scale_` is equal to `None`
 |      when `with_std=False`.
 |  
 |      .. versionadded:: 0.17
 |         *scale_*
 |  
 |  mean_ : ndarray of shape (n_features,) or None
 |      The mean value for each feature in the training set.
 |      Equal to ``None`` when ``with_mean=False``.
 |  
 |  var_ : ndarray of shape (n_features,) or None
 |      The variance for each feature in the training set. Used to compute
 |      `scale_`. Equal to ``None`` when ``with_std=False``.
 |  
 |  n_features_in_ : int
 |      Number of features seen during :term:`fit`.
 |  
 |      .. versionadded:: 0.24
 |  
 |  feature_names_in_ : ndarray of shape (`n_features_in_`,)
 |      Names of features seen during :term:`fit`. Defined only when `X`
 |      has feature names that are all strings.
 |  
 |      .. versionadded:: 1.0
 |  
 |  n_samples_seen_ : int or ndarray of shape (n_features,)
 |      The number of samples processed by the estimator for each feature.
 |      If there are no missing samples, the ``n_samples_seen`` will be an
 |      integer, otherwise it will be an array of dtype int. If
 |      `sample_weights` are used it will be a float (if no missing data)
 |      or an array of dtype float that sums the weights seen so far.
 |      Will be reset on new calls to fit, but increments across
 |      ``partial_fit`` calls.
 |  
 |  See Also
 |  --------
 |  scale : Equivalent function without the estimator API.
 |  
 |  :class:`~sklearn.decomposition.PCA` : Further removes the linear
 |      correlation across features with 'whiten=True'.
 |  
 |  Notes
 |  -----
 |  NaNs are treated as missing values: disregarded in fit, and maintained in
 |  transform.
 |  
 |  We use a biased estimator for the standard deviation, equivalent to
 |  `numpy.std(x, ddof=0)`. Note that the choice of `ddof` is unlikely to
 |  affect model performance.
 |  
 |  For a comparison of the different scalers, transformers, and normalizers,
 |  see :ref:`examples/preprocessing/plot_all_scaling.py
 |  <sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.
 |  
 |  Examples
 |  --------
 |  >>> from sklearn.preprocessing import StandardScaler
 |  >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]]
 |  >>> scaler = StandardScaler()
 |  >>> print(scaler.fit(data))
 |  StandardScaler()
 |  >>> print(scaler.mean_)
 |  [0.5 0.5]
 |  >>> print(scaler.transform(data))
 |  [[-1. -1.]
 |   [-1. -1.]
 |   [ 1.  1.]
 |   [ 1.  1.]]
 |  >>> print(scaler.transform([[2, 2]]))
 |  [[3. 3.]]
 |  
 |  Method resolution order:
 |      StandardScaler
 |      sklearn.base._OneToOneFeatureMixin
 |      sklearn.base.TransformerMixin
 |      sklearn.base.BaseEstimator
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, *, copy=True, with_mean=True, with_std=True)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  fit(self, X, y=None, sample_weight=None)
 |      Compute the mean and std to be used for later scaling.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to compute the mean and standard deviation
 |          used for later scaling along the features axis.
 |      
 |      y : None
 |          Ignored.
 |      
 |      sample_weight : array-like of shape (n_samples,), default=None
 |          Individual weights for each sample.
 |      
 |          .. versionadded:: 0.24
 |             parameter *sample_weight* support to StandardScaler.
 |      
 |      Returns
 |      -------
 |      self : object
 |          Fitted scaler.
 |  
 |  inverse_transform(self, X, copy=None)
 |      Scale back the data to the original representation.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to scale along the features axis.
 |      copy : bool, default=None
 |          Copy the input X or not.
 |      
 |      Returns
 |      -------
 |      X_tr : {ndarray, sparse matrix} of shape (n_samples, n_features)
 |          Transformed array.
 |  
 |  partial_fit(self, X, y=None, sample_weight=None)
 |      Online computation of mean and std on X for later scaling.
 |      
 |      All of X is processed as a single batch. This is intended for cases
 |      when :meth:`fit` is not feasible due to very large number of
 |      `n_samples` or because X is read from a continuous stream.
 |      
 |      The algorithm for incremental mean and std is given in Equation 1.5a,b
 |      in Chan, Tony F., Gene H. Golub, and Randall J. LeVeque. "Algorithms
 |      for computing the sample variance: Analysis and recommendations."
 |      The American Statistician 37.3 (1983): 242-247:
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix} of shape (n_samples, n_features)
 |          The data used to compute the mean and standard deviation
 |          used for later scaling along the features axis.
 |      
 |      y : None
 |          Ignored.
 |      
 |      sample_weight : array-like of shape (n_samples,), default=None
 |          Individual weights for each sample.
 |      
 |          .. versionadded:: 0.24
 |             parameter *sample_weight* support to StandardScaler.
 |      
 |      Returns
 |      -------
 |      self : object
 |          Fitted scaler.
 |  
 |  transform(self, X, copy=None)
 |      Perform standardization by centering and scaling.
 |      
 |      Parameters
 |      ----------
 |      X : {array-like, sparse matrix of shape (n_samples, n_features)
 |          The data used to scale along the features axis.
 |      copy : bool, default=None
 |          Copy the input X or not.
 |      
 |      Returns
 |      -------
 |      X_tr : {ndarray, sparse matrix} of shape (n_samples, n_features)
 |          Transformed array.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base._OneToOneFeatureMixin:
 |  
 |  get_feature_names_out(self, input_features=None)
 |      Get output feature names for transformation.
 |      
 |      Parameters
 |      ----------
 |      input_features : array-like of str or None, default=None
 |          Input features.
 |      
 |          - If `input_features` is `None`, then `feature_names_in_` is
 |            used as feature names in. If `feature_names_in_` is not defined,
 |            then names are generated: `[x0, x1, ..., x(n_features_in_)]`.
 |          - If `input_features` is an array-like, then `input_features` must
 |            match `feature_names_in_` if `feature_names_in_` is defined.
 |      
 |      Returns
 |      -------
 |      feature_names_out : ndarray of str objects
 |          Same as input features.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from sklearn.base._OneToOneFeatureMixin:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.TransformerMixin:
 |  
 |  fit_transform(self, X, y=None, **fit_params)
 |      Fit to data, then transform it.
 |      
 |      Fits transformer to `X` and `y` with optional parameters `fit_params`
 |      and returns a transformed version of `X`.
 |      
 |      Parameters
 |      ----------
 |      X : array-like of shape (n_samples, n_features)
 |          Input samples.
 |      
 |      y :  array-like of shape (n_samples,) or (n_samples, n_outputs),                 default=None
 |          Target values (None for unsupervised transformations).
 |      
 |      **fit_params : dict
 |          Additional fit parameters.
 |      
 |      Returns
 |      -------
 |      X_new : ndarray array of shape (n_samples, n_features_new)
 |          Transformed array.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from sklearn.base.BaseEstimator:
 |  
 |  __getstate__(self)
 |  
 |  __repr__(self, N_CHAR_MAX=700)
 |      Return repr(self).
 |  
 |  __setstate__(self, state)
 |  
 |  get_params(self, deep=True)
 |      Get parameters for this estimator.
 |      
 |      Parameters
 |      ----------
 |      deep : bool, default=True
 |          If True, will return the parameters for this estimator and
 |          contained subobjects that are estimators.
 |      
 |      Returns
 |      -------
 |      params : dict
 |          Parameter names mapped to their values.
 |  
 |  set_params(self, **params)
 |      Set the parameters of this estimator.
 |      
 |      The method works on simple estimators as well as on nested objects
 |      (such as :class:`~sklearn.pipeline.Pipeline`). The latter have
 |      parameters of the form ``<component>__<parameter>`` so that it's
 |      possible to update each component of a nested object.
 |      
 |      Parameters
 |      ----------
 |      **params : dict
 |          Estimator parameters.
 |      
 |      Returns
 |      -------
 |      self : estimator instance
 |          Estimator instance.


scaler=StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)


print(X_train[0])


[-0.7710306   1.41036889  1.08508956  1.25031642  1.39864376  1.39096463 -0.72288062  0.93078432 -0.70710678  1.36833491 -0.73479518  1.39096463  0.88551735  1.53202723 -0.72288062]


3.随机森林训练


from sklearn.ensemble import RandomForestClassifier
rf=RandomForestClassifier()
rf.fit(X_train,y_train)
y_prdrf=rf.predict(X_test)


4.模型评估


from sklearn.metrics import classification_report,confusion_matrix
print(classification_report(y_test,y_prdrf))
cvs_rf=round(cross_val_score(rf,X,y,scoring="accuracy",cv=10).mean(),2)
print("Cross validation score for Random Forest Classifier model is:",cvs_rf)


precision    recall  f1-score   support
           0       0.95      0.99      0.97        79
           1       0.98      0.93      0.95        56
    accuracy                           0.96       135
   macro avg       0.97      0.96      0.96       135
weighted avg       0.96      0.96      0.96       135
Cross validation score for Random Forest Classifier model is: 0.96


5.绘制混淆矩阵


sns.heatmap(confusion_matrix(y_test,y_prdrf),annot=True,cmap='viridis')
plt.xlabel("Predicted")
plt.ylabel("Truth")
plt.title("Confusion matrix- Random Forest Classifier")


Text(0.5,1,'Confusion matrix- Random Forest Classifier')

image.png

可以看出还是相当准确的。



目录
相关文章
|
16天前
|
机器学习/深度学习 人工智能 搜索推荐
人工智能与体育:运动员表现分析
【10月更文挑战第31天】随着科技的发展,人工智能(AI)在体育领域的应用日益广泛,特别是在运动员表现分析方面。本文探讨了AI在数据收集与处理、数据分析与挖掘、实时反馈与调整等方面的应用,以及其在技术动作、战术策略、体能与心理状态评估中的具体作用。尽管面临数据准确性和隐私保护等挑战,AI仍为体育训练和竞技带来了新的机遇和前景。
|
1月前
|
机器学习/深度学习 人工智能 监控
AI与未来医疗:重塑健康产业的双刃剑随着科技的迅猛发展,人工智能(AI)正以前所未有的速度融入各行各业,其中医疗领域作为关系到人类生命健康的重要行业,自然也成为AI应用的焦点之一。本文将探讨AI在未来医疗中的潜力与挑战,分析其对健康产业可能带来的革命性变化。
在医疗领域,人工智能不仅仅是一种技术革新,更是一场关乎生死存亡的革命。从诊断到治疗,从后台数据分析到前端临床应用,AI正在全方位地改变传统医疗模式。然而,任何技术的发展都有其两面性,AI也不例外。本文通过深入分析,揭示AI在医疗领域的巨大潜力及其潜在风险,帮助读者更好地理解这一前沿技术对未来健康产业的影响。
|
2月前
|
机器学习/深度学习 存储 人工智能
文本情感识别分析系统Python+SVM分类算法+机器学习人工智能+计算机毕业设计
使用Python作为开发语言,基于文本数据集(一个积极的xls文本格式和一个消极的xls文本格式文件),使用Word2vec对文本进行处理。通过支持向量机SVM算法训练情绪分类模型。实现对文本消极情感和文本积极情感的识别。并基于Django框架开发网页平台实现对用户的可视化操作和数据存储。
48 0
文本情感识别分析系统Python+SVM分类算法+机器学习人工智能+计算机毕业设计
|
3月前
|
人工智能 自然语言处理 搜索推荐
【人工智能】人工智能(AI)、Web 3.0和元宇宙三者联系、应用及未来发展趋势的详细分析
人工智能(AI)、Web 3.0和元宇宙作为当前科技领域的热门话题,它们之间存在着紧密的联系,并在各自领域内展现出广泛的应用和未来的发展趋势。以下是对这三者联系、应用及未来发展趋势的详细分析
68 2
【人工智能】人工智能(AI)、Web 3.0和元宇宙三者联系、应用及未来发展趋势的详细分析
|
3月前
|
机器学习/深度学习 人工智能 数据处理
【人工智能】项目实践与案例分析:利用机器学习探测外太空中的系外行星
探测外太空中的系外行星是天文学和天体物理学的重要研究领域。随着望远镜观测技术的进步和大数据的积累,科学家们已经能够观测到大量恒星的光度变化,并尝试从中识别出由行星凌日(行星经过恒星前方时遮挡部分光线)引起的微小亮度变化。然而,由于数据量巨大且信号微弱,传统方法难以高效准确地识别所有行星信号。因此,本项目旨在利用机器学习技术,特别是深度学习,从海量的天文观测数据中自动识别和分类系外行星的信号。这要求设计一套高效的数据处理流程、构建适合的机器学习模型,并实现自动化的预测和验证系统。
65 1
【人工智能】项目实践与案例分析:利用机器学习探测外太空中的系外行星
|
3月前
|
机器学习/深度学习 人工智能 算法
【人工智能】传统语音识别算法概述,应用场景,项目实践及案例分析,附带代码示例
传统语音识别算法是将语音信号转化为文本形式的技术,它主要基于模式识别理论和数学统计学方法。以下是传统语音识别算法的基本概述
72 2
|
3月前
|
机器学习/深度学习 数据采集 人工智能
【AI在金融科技中的应用】详细介绍人工智能在金融分析、风险管理、智能投顾等方面的最新应用和发展趋势
人工智能(AI)在金融领域的应用日益广泛,对金融分析、风险管理和智能投顾等方面产生了深远影响。以下是这些领域的最新应用和发展趋势的详细介绍
415 1
|
3月前
|
机器学习/深度学习 人工智能 自然语言处理
【人工智能】TensorFlow简介,应用场景,使用方法以及项目实践及案例分析,附带源代码
TensorFlow是由Google Brain团队开发的开源机器学习库,广泛用于各种复杂的数学计算,特别是涉及深度学习的计算。它提供了丰富的工具和资源,用于构建和训练机器学习模型。TensorFlow的核心是计算图(Computation Graph),这是一种用于表示计算流程的图结构,由节点(代表操作)和边(代表数据流)组成。
74 0
|
4月前
|
人工智能 自然语言处理 小程序
政务VR导航:跨界融合AI人工智能与大数据分析,打造全方位智能政务服务
政务大厅引入智能导航系统,解决寻路难、指引不足及咨询台压力大的问题。VR导视与AI助手提供在线预览、VR路线指引、智能客服和小程序服务,提高办事效率,减轻咨询台工作,优化群众体验,塑造智慧政务形象。通过线上线下结合,实现政务服务的高效便民。
111 0
政务VR导航:跨界融合AI人工智能与大数据分析,打造全方位智能政务服务
|
5月前
|
人工智能 自然语言处理 搜索推荐
元宇宙与人工智能之间的关系紧密而复杂,它们相互影响、相互促进,共同推动了科技的进步和发展。以下是对这两者关系的详细分析:
元宇宙,融合扩展现实、数字孪生和区块链,是虚实相融的互联网新形态,具有同步、开源、永续和闭环经济特点。人工智能则通过模拟人类智能进行复杂任务处理。在元宇宙中,AI创建并管理虚拟环境,生成内容,提供智能交互,如虚拟助手。元宇宙对AI的需求包括大数据处理、智能决策和个性化服务。两者相互促进,AI推动元宇宙体验提升,元宇宙为AI提供应用舞台,共同驱动科技前进。

热门文章

最新文章