ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

简介: ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

设计思路

更新中


输出结果

image.png

image.png

image.png

image.png

核心代码

# flake8: noqa

import warnings

import sys

__version__ = '0.37.0'

# check python version

if (sys.version_info < (3, 0)):

   warnings.warn("As of version 0.29.0 shap only supports Python 3 (not 2)!")

from ._explanation import Explanation, Cohorts

# explainers

from .explainers._explainer import Explainer

from .explainers._kernel import Kernel as KernelExplainer

from .explainers._sampling import Sampling as SamplingExplainer

from .explainers._tree import Tree as TreeExplainer

from .explainers._deep import Deep as DeepExplainer

from .explainers._gradient import Gradient as GradientExplainer

from .explainers._linear import Linear as LinearExplainer

from .explainers._partition import Partition as PartitionExplainer

from .explainers._permutation import Permutation as PermutationExplainer

from .explainers._additive import Additive as AdditiveExplainer

from .explainers import other

# plotting (only loaded if matplotlib is present)

def unsupported(*args, **kwargs):

   warnings.warn("matplotlib is not installed so plotting is not available! Run `pip install matplotlib` to fix this.")

try:

   import matplotlib

   have_matplotlib = True

except ImportError:

   have_matplotlib = False

if have_matplotlib:

   from .plots._beeswarm import summary_legacy as summary_plot

   from .plots._decision import decision as decision_plot, multioutput_decision as multioutput_decision_plot

   from .plots._scatter import dependence_legacy as dependence_plot

   from .plots._force import force as force_plot, initjs, save_html, getjs

   from .plots._image import image as image_plot

   from .plots._monitoring import monitoring as monitoring_plot

   from .plots._embedding import embedding as embedding_plot

   from .plots._partial_dependence import partial_dependence as partial_dependence_plot

   from .plots._bar import bar_legacy as bar_plot

   from .plots._waterfall import waterfall as waterfall_plot

   from .plots._group_difference import group_difference as group_difference_plot

   from .plots._text import text as text_plot

else:

   summary_plot = unsupported

   decision_plot = unsupported

   multioutput_decision_plot = unsupported

   dependence_plot = unsupported

   force_plot = unsupported

   initjs = unsupported

   save_html = unsupported

   image_plot = unsupported

   monitoring_plot = unsupported

   embedding_plot = unsupported

   partial_dependence_plot = unsupported

   bar_plot = unsupported

   waterfall_plot = unsupported

   text_plot = unsupported

# other stuff :)

from . import datasets

from . import utils

from . import links

#from . import benchmark

from .utils._legacy import kmeans

from .utils import sample, approximate_interactions

# TODO: Add support for hclustering based explanations where we sort the leaf order by magnitude and then show the dendrogram to the left

def summary_legacy(shap_values, features=None, feature_names=None, max_display=None, plot_type=None,

                color=None, axis_color="#333333", title=None, alpha=1, show=True, sort=True,

                color_bar=True, plot_size="auto", layered_violin_max_num_bins=20, class_names=None,

                class_inds=None,

                color_bar_label=labels["FEATURE_VALUE"],

                cmap=colors.red_blue,

                # depreciated

                auto_size_plot=None,

                use_log_scale=False):

   """Create a SHAP beeswarm plot, colored by feature values when they are provided.

   Parameters

   ----------

   shap_values : numpy.array

       For single output explanations this is a matrix of SHAP values (# samples x # features).

       For multi-output explanations this is a list of such matrices of SHAP values.

   features : numpy.array or pandas.DataFrame or list

       Matrix of feature values (# samples x # features) or a feature_names list as shorthand

   feature_names : list

       Names of the features (length # features)

   max_display : int

       How many top features to include in the plot (default is 20, or 7 for interaction plots)

   plot_type : "dot" (default for single output), "bar" (default for multi-output), "violin",

       or "compact_dot".

       What type of summary plot to produce. Note that "compact_dot" is only used for

       SHAP interaction values.

   plot_size : "auto" (default), float, (float, float), or None

       What size to make the plot. By default the size is auto-scaled based on the number of

       features that are being displayed. Passing a single float will cause each row to be that

       many inches high. Passing a pair of floats will scale the plot by that

       number of inches. If None is passed then the size of the current figure will be left

       unchanged.

   """

   # support passing an explanation object

   if str(type(shap_values)).endswith("Explanation'>"):

       shap_exp = shap_values

       base_value = shap_exp.base_value

       shap_values = shap_exp.values

       if features is None:

           features = shap_exp.data

       if feature_names is None:

           feature_names = shap_exp.feature_names

       # if out_names is None: # TODO: waiting for slicer support of this

       #     out_names = shap_exp.output_names

   # deprecation warnings

   if auto_size_plot is not None:

       warnings.warn("auto_size_plot=False is deprecated and is now ignored! Use plot_size=None instead.")

   multi_class = False

   if isinstance(shap_values, list):

       multi_class = True

       if plot_type is None:

           plot_type = "bar" # default for multi-output explanations

       assert plot_type == "bar", "Only plot_type = 'bar' is supported for multi-output explanations!"

   else:

       if plot_type is None:

           plot_type = "dot" # default for single output explanations

       assert len(shap_values.shape) != 1, "Summary plots need a matrix of shap_values, not a vector."

   # default color:

   if color is None:

       if plot_type == 'layered_violin':

           color = "coolwarm"

       elif multi_class:

           color = lambda i: colors.red_blue_circle(i/len(shap_values))

       else:

           color = colors.blue_rgb


相关文章
|
2月前
|
开发框架 算法 .NET
基于ADMM无穷范数检测算法的MIMO通信系统信号检测MATLAB仿真,对比ML,MMSE,ZF以及LAMA
简介:本文介绍基于ADMM的MIMO信号检测算法,结合无穷范数优化与交替方向乘子法,降低计算复杂度并提升检测性能。涵盖MATLAB 2024b实现效果图、核心代码及详细注释,并对比ML、MMSE、ZF、OCD_MMSE与LAMA等算法。重点分析LAMA基于消息传递的低复杂度优势,适用于大规模MIMO系统,为通信系统检测提供理论支持与实践方案。(238字)
|
3月前
|
机器学习/深度学习 传感器 算法
【无人车路径跟踪】基于神经网络的数据驱动迭代学习控制(ILC)算法,用于具有未知模型和重复任务的非线性单输入单输出(SISO)离散时间系统的无人车的路径跟踪(Matlab代码实现)
【无人车路径跟踪】基于神经网络的数据驱动迭代学习控制(ILC)算法,用于具有未知模型和重复任务的非线性单输入单输出(SISO)离散时间系统的无人车的路径跟踪(Matlab代码实现)
257 2
|
3月前
|
机器学习/深度学习 并行计算 算法
【CPOBP-NSWOA】基于豪冠猪优化BP神经网络模型的多目标鲸鱼寻优算法研究(Matlab代码实现)
【CPOBP-NSWOA】基于豪冠猪优化BP神经网络模型的多目标鲸鱼寻优算法研究(Matlab代码实现)
|
3月前
|
机器学习/深度学习 资源调度 算法
遗传算法模型深度解析与实战应用
摘要 遗传算法(GA)作为一种受生物进化启发的优化算法,在复杂问题求解中展现出独特优势。本文系统介绍了GA的核心理论、实现细节和应用经验。算法通过模拟自然选择机制,利用选择、交叉、变异三大操作在解空间中进行全局搜索。与梯度下降等传统方法相比,GA不依赖目标函数的连续性或可微性,特别适合处理离散优化、多目标优化等复杂问题。文中详细阐述了染色体编码、适应度函数设计、遗传操作实现等关键技术,并提供了Python代码实现示例。实践表明,GA的成功应用关键在于平衡探索与开发,通过精心调参维持种群多样性同时确保收敛效率
|
3月前
|
机器学习/深度学习 边缘计算 人工智能
粒子群算法模型深度解析与实战应用
蒋星熠Jaxonic是一位深耕智能优化算法领域多年的技术探索者,专注于粒子群优化(PSO)算法的研究与应用。他深入剖析了PSO的数学模型、核心公式及实现方法,并通过大量实践验证了其在神经网络优化、工程设计等复杂问题上的卓越性能。本文全面展示了PSO的理论基础、改进策略与前沿发展方向,为读者提供了一份详尽的技术指南。
粒子群算法模型深度解析与实战应用
|
3月前
|
机器学习/深度学习 数据采集 传感器
【WOA-CNN-LSTM】基于鲸鱼算法优化深度学习预测模型的超参数研究(Matlab代码实现)
【WOA-CNN-LSTM】基于鲸鱼算法优化深度学习预测模型的超参数研究(Matlab代码实现)
255 0
|
2月前
|
机器学习/深度学习 算法 机器人
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
304 0
|
2月前
|
数据采集 分布式计算 并行计算
mRMR算法实现特征选择-MATLAB
mRMR算法实现特征选择-MATLAB
218 2
|
3月前
|
传感器 机器学习/深度学习 编解码
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
229 3
|
3月前
|
存储 编解码 算法
【多光谱滤波器阵列设计的最优球体填充】使用MSFA设计方法进行各种重建算法时,图像质量可以提高至多2 dB,并在光谱相似性方面实现了显著提升(Matlab代码实现)
【多光谱滤波器阵列设计的最优球体填充】使用MSFA设计方法进行各种重建算法时,图像质量可以提高至多2 dB,并在光谱相似性方面实现了显著提升(Matlab代码实现)
173 6