ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

简介: ML之LightGBM:基于titanic数据集利用LightGBM和shap算法实现数据特征的可解释性(量化特征对模型贡献度得分)

设计思路

更新中


输出结果

image.png

image.png

image.png

image.png

核心代码

# flake8: noqa

import warnings

import sys

__version__ = '0.37.0'

# check python version

if (sys.version_info < (3, 0)):

   warnings.warn("As of version 0.29.0 shap only supports Python 3 (not 2)!")

from ._explanation import Explanation, Cohorts

# explainers

from .explainers._explainer import Explainer

from .explainers._kernel import Kernel as KernelExplainer

from .explainers._sampling import Sampling as SamplingExplainer

from .explainers._tree import Tree as TreeExplainer

from .explainers._deep import Deep as DeepExplainer

from .explainers._gradient import Gradient as GradientExplainer

from .explainers._linear import Linear as LinearExplainer

from .explainers._partition import Partition as PartitionExplainer

from .explainers._permutation import Permutation as PermutationExplainer

from .explainers._additive import Additive as AdditiveExplainer

from .explainers import other

# plotting (only loaded if matplotlib is present)

def unsupported(*args, **kwargs):

   warnings.warn("matplotlib is not installed so plotting is not available! Run `pip install matplotlib` to fix this.")

try:

   import matplotlib

   have_matplotlib = True

except ImportError:

   have_matplotlib = False

if have_matplotlib:

   from .plots._beeswarm import summary_legacy as summary_plot

   from .plots._decision import decision as decision_plot, multioutput_decision as multioutput_decision_plot

   from .plots._scatter import dependence_legacy as dependence_plot

   from .plots._force import force as force_plot, initjs, save_html, getjs

   from .plots._image import image as image_plot

   from .plots._monitoring import monitoring as monitoring_plot

   from .plots._embedding import embedding as embedding_plot

   from .plots._partial_dependence import partial_dependence as partial_dependence_plot

   from .plots._bar import bar_legacy as bar_plot

   from .plots._waterfall import waterfall as waterfall_plot

   from .plots._group_difference import group_difference as group_difference_plot

   from .plots._text import text as text_plot

else:

   summary_plot = unsupported

   decision_plot = unsupported

   multioutput_decision_plot = unsupported

   dependence_plot = unsupported

   force_plot = unsupported

   initjs = unsupported

   save_html = unsupported

   image_plot = unsupported

   monitoring_plot = unsupported

   embedding_plot = unsupported

   partial_dependence_plot = unsupported

   bar_plot = unsupported

   waterfall_plot = unsupported

   text_plot = unsupported

# other stuff :)

from . import datasets

from . import utils

from . import links

#from . import benchmark

from .utils._legacy import kmeans

from .utils import sample, approximate_interactions

# TODO: Add support for hclustering based explanations where we sort the leaf order by magnitude and then show the dendrogram to the left

def summary_legacy(shap_values, features=None, feature_names=None, max_display=None, plot_type=None,

                color=None, axis_color="#333333", title=None, alpha=1, show=True, sort=True,

                color_bar=True, plot_size="auto", layered_violin_max_num_bins=20, class_names=None,

                class_inds=None,

                color_bar_label=labels["FEATURE_VALUE"],

                cmap=colors.red_blue,

                # depreciated

                auto_size_plot=None,

                use_log_scale=False):

   """Create a SHAP beeswarm plot, colored by feature values when they are provided.

   Parameters

   ----------

   shap_values : numpy.array

       For single output explanations this is a matrix of SHAP values (# samples x # features).

       For multi-output explanations this is a list of such matrices of SHAP values.

   features : numpy.array or pandas.DataFrame or list

       Matrix of feature values (# samples x # features) or a feature_names list as shorthand

   feature_names : list

       Names of the features (length # features)

   max_display : int

       How many top features to include in the plot (default is 20, or 7 for interaction plots)

   plot_type : "dot" (default for single output), "bar" (default for multi-output), "violin",

       or "compact_dot".

       What type of summary plot to produce. Note that "compact_dot" is only used for

       SHAP interaction values.

   plot_size : "auto" (default), float, (float, float), or None

       What size to make the plot. By default the size is auto-scaled based on the number of

       features that are being displayed. Passing a single float will cause each row to be that

       many inches high. Passing a pair of floats will scale the plot by that

       number of inches. If None is passed then the size of the current figure will be left

       unchanged.

   """

   # support passing an explanation object

   if str(type(shap_values)).endswith("Explanation'>"):

       shap_exp = shap_values

       base_value = shap_exp.base_value

       shap_values = shap_exp.values

       if features is None:

           features = shap_exp.data

       if feature_names is None:

           feature_names = shap_exp.feature_names

       # if out_names is None: # TODO: waiting for slicer support of this

       #     out_names = shap_exp.output_names

   # deprecation warnings

   if auto_size_plot is not None:

       warnings.warn("auto_size_plot=False is deprecated and is now ignored! Use plot_size=None instead.")

   multi_class = False

   if isinstance(shap_values, list):

       multi_class = True

       if plot_type is None:

           plot_type = "bar" # default for multi-output explanations

       assert plot_type == "bar", "Only plot_type = 'bar' is supported for multi-output explanations!"

   else:

       if plot_type is None:

           plot_type = "dot" # default for single output explanations

       assert len(shap_values.shape) != 1, "Summary plots need a matrix of shap_values, not a vector."

   # default color:

   if color is None:

       if plot_type == 'layered_violin':

           color = "coolwarm"

       elif multi_class:

           color = lambda i: colors.red_blue_circle(i/len(shap_values))

       else:

           color = colors.blue_rgb


相关文章
|
1月前
|
机器学习/深度学习 算法 前端开发
别再用均值填充了!MICE算法教你正确处理缺失数据
MICE是一种基于迭代链式方程的缺失值插补方法,通过构建后验分布并生成多个完整数据集,有效量化不确定性。相比简单填补,MICE利用变量间复杂关系,提升插补准确性,适用于多变量关联、缺失率高的场景。本文结合PMM与线性回归,详解其机制并对比效果,验证其在统计推断中的优势。
832 11
别再用均值填充了!MICE算法教你正确处理缺失数据
|
2月前
|
传感器 机器学习/深度学习 算法
【使用 DSP 滤波器加速速度和位移】使用信号处理算法过滤加速度数据并将其转换为速度和位移研究(Matlab代码实现)
【使用 DSP 滤波器加速速度和位移】使用信号处理算法过滤加速度数据并将其转换为速度和位移研究(Matlab代码实现)
225 1
|
2月前
|
机器学习/深度学习 传感器 算法
【无人车路径跟踪】基于神经网络的数据驱动迭代学习控制(ILC)算法,用于具有未知模型和重复任务的非线性单输入单输出(SISO)离散时间系统的无人车的路径跟踪(Matlab代码实现)
【无人车路径跟踪】基于神经网络的数据驱动迭代学习控制(ILC)算法,用于具有未知模型和重复任务的非线性单输入单输出(SISO)离散时间系统的无人车的路径跟踪(Matlab代码实现)
201 2
|
1月前
|
机器学习/深度学习 人工智能 算法
【基于TTNRBO优化DBN回归预测】基于瞬态三角牛顿-拉夫逊优化算法(TTNRBO)优化深度信念网络(DBN)数据回归预测研究(Matlab代码实现)
【基于TTNRBO优化DBN回归预测】基于瞬态三角牛顿-拉夫逊优化算法(TTNRBO)优化深度信念网络(DBN)数据回归预测研究(Matlab代码实现)
113 0
|
2月前
|
存储 监控 算法
企业电脑监控系统中基于 Go 语言的跳表结构设备数据索引算法研究
本文介绍基于Go语言的跳表算法在企业电脑监控系统中的应用,通过多层索引结构将数据查询、插入、删除操作优化至O(log n),显著提升海量设备数据管理效率,解决传统链表查询延迟问题,实现高效设备状态定位与异常筛选。
115 3
|
2月前
|
机器学习/深度学习 数据采集 传感器
【WOA-CNN-LSTM】基于鲸鱼算法优化深度学习预测模型的超参数研究(Matlab代码实现)
【WOA-CNN-LSTM】基于鲸鱼算法优化深度学习预测模型的超参数研究(Matlab代码实现)
206 0
|
1月前
|
机器学习/深度学习 算法 机器人
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
【水下图像增强融合算法】基于融合的水下图像与视频增强研究(Matlab代码实现)
196 0
|
1月前
|
数据采集 分布式计算 并行计算
mRMR算法实现特征选择-MATLAB
mRMR算法实现特征选择-MATLAB
144 2
|
2月前
|
传感器 机器学习/深度学习 编解码
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
MATLAB|主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性
198 3
|
2月前
|
存储 编解码 算法
【多光谱滤波器阵列设计的最优球体填充】使用MSFA设计方法进行各种重建算法时,图像质量可以提高至多2 dB,并在光谱相似性方面实现了显著提升(Matlab代码实现)
【多光谱滤波器阵列设计的最优球体填充】使用MSFA设计方法进行各种重建算法时,图像质量可以提高至多2 dB,并在光谱相似性方面实现了显著提升(Matlab代码实现)
127 6

热门文章

最新文章