DEL编码新药预测的多种机器学习模型对比

简介: 数据集描述数据集中每个分子具有三个构建块。该数据集用于表示分子的三个构建块是否能够与蛋白质相结合,如果能够结合标记为binds为1,否则binds为0.格式描述如下:• id- 我们用来识别分子结合靶标对的独特example_id。• buildingblock1_smiles- 在SMILES中,第一个构建块的结构• buildingblock2_smiles- 在SMILES中,第二个构建块的结构• buildingblock3_smiles- 在SMILES中,第三个构建块的结构• molecule_smiles- 完全组装的分子的结构,在SMILES中。这包括三个构建单元

数据集描述

数据集中每个分子具有三个构建块。该数据集用于表示分子的三个构建块是否能够与蛋白质相结合,如果能够结合标记为binds为1,否则binds为0.

格式描述如下:

  • id- 我们用来识别分子结合靶标对的独特example_id。
  • buildingblock1_smiles- 在SMILES中,第一个构建块的结构
  • buildingblock2_smiles- 在SMILES中,第二个构建块的结构
  • buildingblock3_smiles- 在SMILES中,第三个构建块的结构
  • molecule_smiles- 完全组装的分子的结构,在SMILES中。这包括三个构建单元和三嗪核心。请注意,我们使用 a 作为 DNA 接头的替代物。[Dy]
  • protein_name- 蛋白质靶标名称
  • binds- 目标列。分子是否与蛋白质结合的二元类标记。不适用于测试集。

工具库描述

  • rdkit 用于化学信息学的开源工具包,提供了丰富的功能来支持药物涉及、生物活性预测、化学反应预测和化学数据处理等领域。本案例中主要用于计算分子指纹。
  • duckdb 开源嵌入式分析型数据库管理系统,转为数据分析和在线分析处理(OLAP)二涉及。本案例主要用于列式存储数据分析。
  • PySMILES 用于处理SMILES格式的分子表示。

算法详解

!pip install duckdb
!pip install pysmiles
!pip install rdkit

数据加载

#导入系统库
import re
import os
import unicodedata
import itertools

#导入数据处理库
import pandas as pd
import numpy as np
import pandas

#导入数据库处理库
import duckdb

#导入数据虚拟化库
import pysmiles
import plotly
import seaborn as sns
import matplotlib.pylab as pl
import matplotlib.pylab as m
import matplotlib.pylab as mpk
import matplotlib.pyplot as plt
import plotly.express as px
from matplotlib import pyplot as plt
from rdkit import Chem
from rdkit.Chem import Draw, AllChem
from rdkit import RDLogger
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem.Draw import rdMolDraw2D

#python版本
from IPython.display import SVG
IPythonConsole.ipthon_userSVG=True

#配置图像的绘制风格
sns.set_theme(style='whitegrid')
palette='viridis'

#从Parquet导入数据库,读取数据
data_train = '/input/train.parquet'
test_path = '/input/test.parquet'

#连接duckdb数据库
con = duckdb.connect()

#查询出bind成功和bind失败的数据,打乱了随机排序,取前30000条。以dataframe的格式返回
data = con.query(f"""(SELECT * FROM parquet_scan('{data_train}') 
WHERE binds = 0
ORDER BY random()
LIMIT 30000)
UNION ALL
(SELECT * FROM parquet_scan('{data_train}')
WHERE binds = 1
ORDER BY random() 
LIMIT 30000)""").df()

#关闭数据库
con.close()

#保存数据到csv文件
data.to_csv('/working/dataset.csv')

数据预处理

在预处理阶段,我们执行几个基本步骤来准备用于分析的数据。首先,应用数据清洗技术去除重复项和处理缺失值;然后,根据数据的性质,使用适当的编码方法,如one-hot编码或标签编码,将分类变量转换为数值变量。此外,我们将数值变量标准化或标准化,以确保它们处于相同的尺度上,这对许多机器学习算法至关重要。这些预处理步骤确保了数据格式适合分析模型,提高了后续分析的准确性和效率。

#应用rdkit将分子式转换为rdkit分子对象
data['molecule'] = data['molecule_smiles'].apply(Chem.MolFromSmiles)

#创建分子指纹位图函数
def modl(molecule_data, radius=2, bits=1024):
    if molecule_data is None:
        return None
    return list(AllChem.GetMorganFingerprintAsBitVect(molecule_data, radius, nBits=bits))

#根据分子对象和位图函数生成指纹
data['H1_ecfp'] = data['molecule'].apply(modl)
from sklearn.preprocessing import OneHotEncoder

encoder_onehot = OneHotEncoder(sparse_output=False)
encoder_onehot_fit = encoder_onehot.fit_transform(data['protein_name'].values.reshape(-1,1))
#分子指纹和蛋白质独热编码进行组合,用于创建唯一特征减少分类
X = [ecfp + protein for ecfp, protein in zip(data['H1_ecfp'].tolist(), encoder_onehot_fit.tolist())]
y = data[binds].tolist

这里,我们对两个变量进行了划分:“H1_ecfp”和名为“绑定”的目标变量。这一步对于规范化数据至关重要,确保“H1_ecfp”的值相对于目标变量“绑定”进行缩放。归一化对于避免可能影响各种机器学习算法性能的尺度问题很重要,特别是那些基于距离的算法,如k近邻(KNN)和聚类方法。此外,这种操作可以为“H1_ecfp”和“绑定”之间的比例关系提供有价值的见解,允许更好地解释模型的结果。该部门可以突出数据中可能对预测建模至关重要的隐藏趋势或模式。通过适当的归一化,我们可以提高模型的稳定性和准确性,确保所有变量对学习过程的贡献相等。

模型训练

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#导入进度生成库
from tqdm import tqdm

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KneightborsClassifier
from sklearn.tree import DevisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier 
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier, plot_importance as plot_importance_xgb
from lightgbm import LGBMClassifier, plot_importance as plot_importance_lgbm


#度量标准和模型评估
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.metrics import roc_curve, auc, confusion_matrix, accuracy_score, classification_report

#机器学习模型
models = {
  #逻辑回归模型
  "Logistic Regression": LogisticRegression(),

  #朴素贝叶斯模型
  "Naive bayes": GaussianNB(),

  #KNN模型
  "KNN": KNeighborsClassifier(),

  #AdaBoost模型(通过迭代弱分类器形成强分类器)
  "Ada Boost": AdaBoostClassifier(),

  #梯度提升模型(通过迭代训练决策树来提供预测准确率)
  "Gradient Boosting Classifier":GradientBoostingClassifier(),

  #决策树模型
  "Decision Tree Classifier" : DecisionTreeClassifier(max_depth=5,
  min_samples_split=2,
  random_state=105),

  #XGBoost 模型(优化的分布式梯度提升库)
  "XGBoost": XGBClassifier(n_estimators=100,
  max_depth=250,
  learning_rate=0.1,
  subsample=0.8,
  colsample_bytree=0.8
  num_class=3,
  random_state=42,
  tree_method='gpu_hist'),

  #LGBM 模型(基于决策树算法的分布式梯度提升框架)
  "LGBM": LGBMClassifier(boosting_type='gbdt',
  bagging_freq=5,
  verbose=0,
  device='gpu',
  num_leaves=31,
  max_depth=250,
  learning_rate=0.1,
  n_estimators=100)
}

#模型训练
for name, model in tqdm(models.items(), desc="traning models", total=len(models)):
  #模型学习
  model.fit(X_train, y_train)

  #通过交叉验证的方式找出最好的参数,折叠10次
  score_training = cross_val_score(model, X_train, y_train, cv=10)

  #使用模型进行预测
  pred_mode = mode.predict(X_test)

  #展示模型进度和结果
  tqdm.write("Model: {} has Accuracy {:.2f}%".format(model.__class__.__name__,round(score_training.mean(), 2) * 100))

  print()
Training models:  12%|█▎        | 1/8 [01:40<11:44, 100.63s/it]
Model: LogisticRegression has Accuracy 87.00%

Training models:  25%|██▌       | 2/8 [03:00<08:50, 88.36s/it] 
Model: GaussianNB has Accuracy 74.00%

Training models:  38%|███▊      | 3/8 [05:12<09:02, 108.45s/it]
Model: KNeighborsClassifier has Accuracy 80.00%

Training models:  50%|█████     | 4/8 [13:33<17:33, 263.47s/it]
Model: AdaBoostClassifier has Accuracy 79.00%

Training models:  62%|██████▎   | 5/8 [43:18<40:36, 812.11s/it]
Model: GradientBoostingClassifier has Accuracy 84.00%

Training models:  75%|███████▌  | 6/8 [44:35<18:44, 562.10s/it]
Model: DecisionTreeClassifier has Accuracy 75.00%

Training models:  88%|████████▊ | 7/8 [52:01<08:44, 524.23s/it]
Model: XGBClassifier has Accuracy 91.00%

[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
1 warning generated.
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Training models: 100%|██████████| 8/8 [53:43<00:00, 402.98s/it]
[LightGBM] [Warning] bagging_freq is set=5, subsample_freq=0 will be ignored. Current value: bagging_freq=5
[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
Model: LGBMClassifier has Accuracy 89.00%

CPU times: user 58min 38s, sys: 1min 6s, total: 59min 44s
Wall time: 53min 47s

为这个项目生成了8个机器学习模型:逻辑回归、朴素贝叶斯、k近邻(KNN)、决策树、AdaBoost、梯度提升、XGBoost和LightGBM。使用特定的数据集对每个模型进行训练和评估,以确定表现最佳的模型。经过评估,LightGBM模型是最有效的,达到了90%的准确率。该模型不仅表现出了最好的准确率,而且在精确度、召回率和F1-score等其他性能指标上也表现出了鲁棒性,表明了其在各种情况下的一致性和可靠性。次优的表现是XGBoost模型,达到了84%的准确率。虽然准确率低于LightGBM,但XGBoost在其他评价指标上也表现出了良好的效果。此外,详细分析了每个模型在不同数据子集上的性能,以验证其泛化性并防止过拟合。基于此分析,LightGBM不仅在精度方面,而且在泛化能力和稳定性方面证明了其优越性。因此,综合考虑所有评估标准,LightGBM模型表现出最高的坚持度和性能,使其成为在此背景下未来实现的最推荐的选择。

auc图

for name, model in models.items():
  #模型训练
  model.fit(X_train, y_train)

  #在test集上进行预测
  y_pred = model.predict(X_test)
  print("Machine Learning Model:", name)

  # ROC curve 提取正向值进行对比
  fpr, tpr, _ = roc_curve(y_test, model.predict_proba(X_test)[:,1])
  roc_auc = auc(fpr, tpr)

  plt.figure()
  plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
  plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
  plt.xlim([0.0, 1.0])
  plt.ylim([0.0, 1.05])
  plt.xlabel('False Positive Rate')
  plt.ylabel('True Positive Rate')
  plt.title('Receiver Operating Characteristic - {}'.format(name))
  plt.legend(loc="lower right")
  plt.grid()
相关文章
|
21天前
|
机器学习/深度学习 人工智能 数据处理
人工智能平台PAI操作报错合集之任务重启后出现模型拆分报错,该怎么办
阿里云人工智能平台PAI是一个功能强大、易于使用的AI开发平台,旨在降低AI开发门槛,加速创新,助力企业和开发者高效构建、部署和管理人工智能应用。其中包含了一系列相互协同的产品与服务,共同构成一个完整的人工智能开发与应用生态系统。以下是对PAI产品使用合集的概述,涵盖数据处理、模型开发、训练加速、模型部署及管理等多个环节。
|
2天前
|
机器学习/深度学习 监控 API
基于云计算的机器学习模型部署与优化
【8月更文第17天】随着云计算技术的发展,越来越多的数据科学家和工程师开始使用云平台来部署和优化机器学习模型。本文将介绍如何在主要的云计算平台上部署机器学习模型,并讨论模型优化策略,如模型压缩、超参数调优以及分布式训练。
14 2
|
3天前
|
机器学习/深度学习 JSON API
【Python奇迹】FastAPI框架大显神通:一键部署机器学习模型,让数据预测飞跃至Web舞台,震撼开启智能服务新纪元!
【8月更文挑战第16天】在数据驱动的时代,高效部署机器学习模型至关重要。FastAPI凭借其高性能与灵活性,成为搭建模型API的理想选择。本文详述了从环境准备、模型训练到使用FastAPI部署的全过程。首先,确保安装了Python及相关库(fastapi、uvicorn、scikit-learn)。接着,以线性回归为例,构建了一个预测房价的模型。通过定义FastAPI端点,实现了基于房屋大小预测价格的功能,并介绍了如何运行服务器及测试API。最终,用户可通过HTTP请求获取预测结果,极大地提升了模型的实用性和集成性。
13 1
|
5天前
|
机器学习/深度学习 人工智能 算法
探索机器学习中的模型优化策略
【8月更文挑战第14天】在机器学习领域,模型的优化是提升预测性能的关键步骤。本文将深入探讨几种有效的模型优化策略,包括超参数调优、正则化方法以及集成学习技术。通过这些策略的应用,可以显著提高模型的泛化能力,减少过拟合现象,并增强模型对新数据的适应能力。
|
18天前
|
机器学习/深度学习 运维
【阿里天池-医学影像报告异常检测】4 机器学习模型调参
本文提供了对医学影像报告异常检测任务中使用的机器学习模型(如XGBoost和LightGBM)进行参数调整的方法,并分享了特征提取和模型调优的最佳实践。
30 13
|
11天前
|
机器学习/深度学习 算法 数据可视化
【机器学习】机器学习中的人工神经元模型有哪些?
本文概述了多种人工神经元模型,包括线性神经元、非线性神经元、自适应线性神经元(ADALINE)、感知机神经元、McCulloch-Pitts神经元、径向基函数神经元(RBF)、径向基概率神经元(RBPNN)、模糊神经元、自组织映射神经元(SOM)、CMAC神经元、LIF神经元、Izhikevich神经元、Spiking神经元、Swish神经元和Boltzmann神经元,各自的特点和应用领域,为理解神经网络中神经元的多样性和适应性提供了基础。
14 4
|
13天前
|
机器学习/深度学习 数据采集 算法
【机器学习】K-Means聚类的执行过程?优缺点?有哪些改进的模型?
K-Means聚类的执行过程、优缺点,以及改进模型,包括K-Means++和ISODATA算法,旨在解决传统K-Means算法在确定初始K值、收敛到局部最优和对噪声敏感等问题上的局限性。
30 2
|
13天前
|
机器学习/深度学习 算法 数据挖掘
|
18天前
|
机器学习/深度学习 算法
【Deepin 20系统】机器学习分类算法模型xgboost、lightgbm、catboost安装及使用
介绍了在Deepin 20系统上使用pip命令通过清华大学镜像源安装xgboost、lightgbm和catboost三个机器学习分类算法库的过程。
14 4
|
18天前
|
机器学习/深度学习
【机器学习】模型融合Ensemble和集成学习Stacking的实现
文章介绍了使用mlxtend和lightgbm库中的分类器,如EnsembleVoteClassifier和StackingClassifier,以及sklearn库中的SVC、KNeighborsClassifier等进行模型集成的方法。
24 1

热门文章

最新文章