ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

简介: ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

设计思路

更新中


输出结果

image.png

image.png

image.png

image.png

核心代码

# flake8: noqa

import warnings

import sys

__version__ = '0.37.0'

# check python version

if (sys.version_info < (3, 0)):

   warnings.warn("As of version 0.29.0 shap only supports Python 3 (not 2)!")

from ._explanation import Explanation, Cohorts

# explainers

from .explainers._explainer import Explainer

from .explainers._kernel import Kernel as KernelExplainer

from .explainers._sampling import Sampling as SamplingExplainer

from .explainers._tree import Tree as TreeExplainer

from .explainers._deep import Deep as DeepExplainer

from .explainers._gradient import Gradient as GradientExplainer

from .explainers._linear import Linear as LinearExplainer

from .explainers._partition import Partition as PartitionExplainer

from .explainers._permutation import Permutation as PermutationExplainer

from .explainers._additive import Additive as AdditiveExplainer

from .explainers import other

# plotting (only loaded if matplotlib is present)

def unsupported(*args, **kwargs):

   warnings.warn("matplotlib is not installed so plotting is not available! Run `pip install matplotlib` to fix this.")

try:

   import matplotlib

   have_matplotlib = True

except ImportError:

   have_matplotlib = False

if have_matplotlib:

   from .plots._beeswarm import summary_legacy as summary_plot

   from .plots._decision import decision as decision_plot, multioutput_decision as multioutput_decision_plot

   from .plots._scatter import dependence_legacy as dependence_plot

   from .plots._force import force as force_plot, initjs, save_html, getjs

   from .plots._image import image as image_plot

   from .plots._monitoring import monitoring as monitoring_plot

   from .plots._embedding import embedding as embedding_plot

   from .plots._partial_dependence import partial_dependence as partial_dependence_plot

   from .plots._bar import bar_legacy as bar_plot

   from .plots._waterfall import waterfall as waterfall_plot

   from .plots._group_difference import group_difference as group_difference_plot

   from .plots._text import text as text_plot

else:

   summary_plot = unsupported

   decision_plot = unsupported

   multioutput_decision_plot = unsupported

   dependence_plot = unsupported

   force_plot = unsupported

   initjs = unsupported

   save_html = unsupported

   image_plot = unsupported

   monitoring_plot = unsupported

   embedding_plot = unsupported

   partial_dependence_plot = unsupported

   bar_plot = unsupported

   waterfall_plot = unsupported

   text_plot = unsupported

# other stuff :)

from . import datasets

from . import utils

from . import links

#from . import benchmark

from .utils._legacy import kmeans

from .utils import sample, approximate_interactions

# TODO: Add support for hclustering based explanations where we sort the leaf order by magnitude and then show the dendrogram to the left

def summary_legacy(shap_values, features=None, feature_names=None, max_display=None, plot_type=None,

                color=None, axis_color="#333333", title=None, alpha=1, show=True, sort=True,

                color_bar=True, plot_size="auto", layered_violin_max_num_bins=20, class_names=None,

                class_inds=None,

                color_bar_label=labels["FEATURE_VALUE"],

                cmap=colors.red_blue,

                # depreciated

                auto_size_plot=None,

                use_log_scale=False):

   """Create a SHAP beeswarm plot, colored by feature values when they are provided.

   Parameters

   ----------

   shap_values : numpy.array

       For single output explanations this is a matrix of SHAP values (# samples x # features).

       For multi-output explanations this is a list of such matrices of SHAP values.

   features : numpy.array or pandas.DataFrame or list

       Matrix of feature values (# samples x # features) or a feature_names list as shorthand

   feature_names : list

       Names of the features (length # features)

   max_display : int

       How many top features to include in the plot (default is 20, or 7 for interaction plots)

   plot_type : "dot" (default for single output), "bar" (default for multi-output), "violin",

       or "compact_dot".

       What type of summary plot to produce. Note that "compact_dot" is only used for

       SHAP interaction values.

   plot_size : "auto" (default), float, (float, float), or None

       What size to make the plot. By default the size is auto-scaled based on the number of

       features that are being displayed. Passing a single float will cause each row to be that

       many inches high. Passing a pair of floats will scale the plot by that

       number of inches. If None is passed then the size of the current figure will be left

       unchanged.

   """

   # support passing an explanation object

   if str(type(shap_values)).endswith("Explanation'>"):

       shap_exp = shap_values

       base_value = shap_exp.base_value

       shap_values = shap_exp.values

       if features is None:

           features = shap_exp.data

       if feature_names is None:

           feature_names = shap_exp.feature_names

       # if out_names is None: # TODO: waiting for slicer support of this

       #     out_names = shap_exp.output_names

   # deprecation warnings

   if auto_size_plot is not None:

       warnings.warn("auto_size_plot=False is deprecated and is now ignored! Use plot_size=None instead.")

   multi_class = False

   if isinstance(shap_values, list):

       multi_class = True

       if plot_type is None:

           plot_type = "bar" # default for multi-output explanations

       assert plot_type == "bar", "Only plot_type = 'bar' is supported for multi-output explanations!"

   else:

       if plot_type is None:

           plot_type = "dot" # default for single output explanations

       assert len(shap_values.shape) != 1, "Summary plots need a matrix of shap_values, not a vector."

   # default color:

   if color is None:

       if plot_type == 'layered_violin':

           color = "coolwarm"

       elif multi_class:

           color = lambda i: colors.red_blue_circle(i/len(shap_values))

       else:

           color = colors.blue_rgb


相关文章
|
15小时前
|
数据采集 机器学习/深度学习 人工智能
【机器学习】在使用K-means算法之前,如何预处理数据?
【5月更文挑战第12天】【机器学习】在使用K-means算法之前,如何预处理数据?
|
3天前
|
机器学习/深度学习 人工智能 算法
高性价比发文典范——101种机器学习算法组合革新骨肉瘤预后模型
随着高通量测序技术的飞速发展和多组学分析的广泛应用,科研人员在探索生物学奥秘时经常遇到一个令人又爱又恼的问题:如何从浩如烟海的数据中挖掘出潜在的疾病关联靶点?又如何构建一个全面而有效的诊断或预后模型?只有通过优雅的数据挖掘、精致的结果展示、深入的讨论分析,并且辅以充分的湿实验验证,我们才能锻造出一篇兼具深度与广度的“干湿结合”佳作。
15 0
高性价比发文典范——101种机器学习算法组合革新骨肉瘤预后模型
|
5天前
|
算法 调度
考虑需求响应的微网优化调度模型【粒子群算法】【matlab】
考虑需求响应的微网优化调度模型【粒子群算法】【matlab】
|
5天前
|
算法 调度
【免费】基于模型预测算法的含储能微网双层能量管理模型(MATLAB)
【免费】基于模型预测算法的含储能微网双层能量管理模型(MATLAB)
|
7天前
|
机器学习/深度学习 自然语言处理 算法
Python遗传算法GA对长短期记忆LSTM深度学习模型超参数调优分析司机数据|附数据代码
Python遗传算法GA对长短期记忆LSTM深度学习模型超参数调优分析司机数据|附数据代码
|
7天前
|
算法 搜索推荐
R语言混合SVD模型IBCF协同过滤推荐算法研究——以母婴购物平台为例
R语言混合SVD模型IBCF协同过滤推荐算法研究——以母婴购物平台为例
|
3天前
|
算法 数据安全/隐私保护 计算机视觉
基于二维CS-SCHT变换和LABS方法的水印嵌入和提取算法matlab仿真
该内容包括一个算法的运行展示和详细步骤,使用了MATLAB2022a。算法涉及水印嵌入和提取,利用LAB色彩空间可能用于隐藏水印。水印通过二维CS-SCHT变换、低频系数处理和特定解码策略来提取。代码段展示了水印置乱、图像处理(如噪声、旋转、剪切等攻击)以及水印的逆置乱和提取过程。最后,计算并保存了比特率,用于评估水印的稳健性。
|
4天前
|
存储 算法 数据可视化
基于harris角点和RANSAC算法的图像拼接matlab仿真
本文介绍了使用MATLAB2022a进行图像拼接的流程,涉及Harris角点检测和RANSAC算法。Harris角点检测寻找图像中局部曲率变化显著的点,RANSAC则用于排除噪声和异常点,找到最佳匹配。核心程序包括自定义的Harris角点计算函数,RANSAC参数设置,以及匹配点的可视化和仿射变换矩阵计算,最终生成全景图像。
|
4天前
|
算法 Serverless
m基于遗传优化的LDPC码NMS译码算法最优归一化参数计算和误码率matlab仿真
MATLAB 2022a仿真实现了遗传优化的归一化最小和(NMS)译码算法,应用于低密度奇偶校验(LDPC)码。结果显示了遗传优化的迭代过程和误码率对比。遗传算法通过选择、交叉和变异操作寻找最佳归一化因子,以提升NMS译码性能。核心程序包括迭代优化、目标函数计算及性能绘图。最终,展示了SNR与误码率的关系,并保存了关键数据。
13 1
|
5天前
|
运维 算法
基于改进遗传算法的配电网故障定位(matlab代码)
基于改进遗传算法的配电网故障定位(matlab代码)