目录
2、XGBModel之plot_importance的原生代码
相关文章
ML之xgboost:解读用法之xgboost库的core.py文件中的get_score(importance_type=self.importance_type)方法
ML之xgboost :xgboost.plot_importance()函数的解读
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
feature_importances_
1、feature_importances_方法的解释
XGBRegressor( ).feature_importances_
参数
- 注意:特性重要性只定义为树增强器。只有在选择决策树模型作为基础时,才定义特征重要性。
- 学习器(“助推器= gbtree”)。它不定义为其他基本的学习者类型,如线性学习者 (`booster=gblinear`).。
返回
- feature_importances_: ' ' [n_features] ' '形状的数组
注意:importance_type: string, default "gain", The feature importance type for the feature_importances_ property: either "gain", "weight", "cover", "total_gain" or "total_cover".
2、feature_importances_的原生代码
1. class XGBModel(XGBModelBase): 2. # pylint: disable=too-many-arguments, too-many-instance-attributes, invalid-name 3. """Implementation of the Scikit-Learn API for XGBoost. 4. 5. Parameters 6. ---------- 7. max_depth : int 8. Maximum tree depth for base learners. 9. learning_rate : float 10. Boosting learning rate (xgb's "eta") 11. n_estimators : int 12. Number of boosted trees to fit. 13. silent : boolean 14. Whether to print messages while running boosting. 15. objective : string or callable 16. Specify the learning task and the corresponding learning objective or 17. a custom objective function to be used (see note below). 18. booster: string 19. Specify which booster to use: gbtree, gblinear or dart. 20. nthread : int 21. Number of parallel threads used to run xgboost. (Deprecated, please use ``n_jobs``) 22. n_jobs : int 23. Number of parallel threads used to run xgboost. (replaces ``nthread``) 24. gamma : float 25. Minimum loss reduction required to make a further partition on a leaf node of the tree. 26. min_child_weight : int 27. Minimum sum of instance weight(hessian) needed in a child. 28. max_delta_step : int 29. Maximum delta step we allow each tree's weight estimation to be. 30. subsample : float 31. Subsample ratio of the training instance. 32. colsample_bytree : float 33. Subsample ratio of columns when constructing each tree. 34. colsample_bylevel : float 35. Subsample ratio of columns for each split, in each level. 36. reg_alpha : float (xgb's alpha) 37. L1 regularization term on weights 38. reg_lambda : float (xgb's lambda) 39. L2 regularization term on weights 40. scale_pos_weight : float 41. Balancing of positive and negative weights. 42. base_score: 43. The initial prediction score of all instances, global bias. 44. seed : int 45. Random number seed. (Deprecated, please use random_state) 46. random_state : int 47. Random number seed. (replaces seed) 48. missing : float, optional 49. Value in the data which needs to be present as a missing value. If 50. None, defaults to np.nan. 51. importance_type: string, default "gain" 52. The feature importance type for the feature_importances_ property: either "gain", 53. "weight", "cover", "total_gain" or "total_cover". 54. \*\*kwargs : dict, optional 55. Keyword arguments for XGBoost Booster object. Full documentation of parameters can 56. be found here: https://github.com/dmlc/xgboost/blob/master/doc/parameter.rst. 57. Attempting to set a parameter via the constructor args and \*\*kwargs dict simultaneously 58. will result in a TypeError. 59. 60. .. note:: \*\*kwargs unsupported by scikit-learn 61. 62. \*\*kwargs is unsupported by scikit-learn. We do not guarantee that parameters 63. passed via this argument will interact properly with scikit-learn. 64. 65. Note 66. ---- 67. A custom objective function can be provided for the ``objective`` 68. parameter. In this case, it should have the signature 69. ``objective(y_true, y_pred) -> grad, hess``: 70. 71. y_true: array_like of shape [n_samples] 72. The target values 73. y_pred: array_like of shape [n_samples] 74. The predicted values 75. 76. grad: array_like of shape [n_samples] 77. The value of the gradient for each sample point. 78. hess: array_like of shape [n_samples] 79. The value of the second derivative for each sample point 80. """ 81. 82. def __init__(self, max_depth=3, learning_rate=0.1, n_estimators=100, 83. silent=True, objective="reg:linear", booster='gbtree', 84. n_jobs=1, nthread=None, gamma=0, min_child_weight=1, max_delta_step=0, 85. subsample=1, colsample_bytree=1, colsample_bylevel=1, 86. reg_alpha=0, reg_lambda=1, scale_pos_weight=1, 87. base_score=0.5, random_state=0, seed=None, missing=None, 88. importance_type="gain", **kwargs): 89. if not SKLEARN_INSTALLED: 90. raise XGBoostError('sklearn needs to be installed in order to use this module') 91. self.max_depth = max_depth 92. self.learning_rate = learning_rate 93. self.n_estimators = n_estimators 94. self.silent = silent 95. self.objective = objective 96. self.booster = booster 97. self.gamma = gamma 98. self.min_child_weight = min_child_weight 99. self.max_delta_step = max_delta_step 100. self.subsample = subsample 101. self.colsample_bytree = colsample_bytree 102. self.colsample_bylevel = colsample_bylevel 103. self.reg_alpha = reg_alpha 104. self.reg_lambda = reg_lambda 105. self.scale_pos_weight = scale_pos_weight 106. self.base_score = base_score 107. self.missing = missing if missing is not None else np.nan 108. self.kwargs = kwargs 109. self._Booster = None 110. self.seed = seed 111. self.random_state = random_state 112. self.nthread = nthread 113. self.n_jobs = n_jobs 114. self.importance_type = importance_type 115. 116. def feature_importances_(self): 117. """ 118. Feature importances property 119. 120. .. note:: Feature importance is defined only for tree boosters 121. 122. Feature importance is only defined when the decision tree model is chosen as base 123. learner (`booster=gbtree`). It is not defined for other base learner types, such 124. as linear learners (`booster=gblinear`). 125. 126. Returns 127. ------- 128. feature_importances_ : array of shape ``[n_features]`` 129. 130. """ 131. if getattr(self, 'booster', None) is not None and self.booster != 'gbtree': 132. raise AttributeError( 133. 'Feature importance is not defined for Booster type {}'.format(self.booster)) 134. b = self.get_booster() 135. score = b.get_score(importance_type=self.importance_type) 136. all_features = [score.get(f, 0.) for f in b.feature_names] 137. all_features = np.array(all_features, dtype=np.float32) 138. return all_features / all_features.sum()
plot_importance
1、plot_importance方法的解释
作用:基于拟合树的重要性可视化。
参数
- booster : Booster, XGBModel or dict. Booster or XGBModel instance, or dict taken by Booster.get_fscore()
- ax : matplotlib Axes, default None. Target axes instance. If None, new figure and axes will be created.
- grid : bool, Turn the axes grids on or off. Default is True (On).
- importance_type : str, default "weight". How the importance is calculated: either "weight", "gain", or "cover"
* "weight" is the number of times a feature appears in a tree,在树中出现的次数。
* "gain" is the average gain of splits which use the feature,使用该特性的分割的平均增益。
* "cover" is the average coverage of splits which use the feature, where coverage is defined as the number of samples affected by the split. 分割的平均覆盖率,其中覆盖率定义为受分割影响的样本数。 - max_num_features : int, default None. Maximum number of top features displayed on plot. If None, all features will be displayed.
- height : float, default 0.2. Bar height, passed to ax.barh()
- xlim : tuple, default None. Tuple passed to axes.xlim()
- ylim : tuple, default None. Tuple passed to axes.ylim()
- title : str, default "Feature importance". Axes title. To disable, pass None.
- xlabel : str, default "F score". X axis title label. To disable, pass None.
- ylabel : str, default "Features". Y axis title label. To disable, pass None.
- show_values : bool, default True. Show values on plot. To disable, pass False.
- kwargs : Other keywords passed to ax.barh()
返回
- ax : matplotlib Axes
2、XGBModel之plot_importance的原生代码
1. # coding: utf-8 2. # pylint: disable=too-many-locals, too-many-arguments, invalid-name, 3. # pylint: disable=too-many-branches 4. """Plotting Library.""" 5. from __future__ import absolute_import 6. 7. import re 8. from io import BytesIO 9. import numpy as np 10. from .core import Booster 11. from .sklearn import XGBModel 12. 13. 14. def plot_importance(booster, ax=None, height=0.2, 15. xlim=None, ylim=None, title='Feature importance', 16. xlabel='F score', ylabel='Features', 17. importance_type='weight', max_num_features=None, 18. grid=True, show_values=True, **kwargs): 19. """Plot importance based on fitted trees. 20. 21. Parameters 22. ---------- 23. booster : Booster, XGBModel or dict 24. Booster or XGBModel instance, or dict taken by Booster.get_fscore() 25. ax : matplotlib Axes, default None 26. Target axes instance. If None, new figure and axes will be created. 27. grid : bool, Turn the axes grids on or off. Default is True (On). 28. importance_type : str, default "weight" 29. How the importance is calculated: either "weight", "gain", or "cover" 30. 31. * "weight" is the number of times a feature appears in a tree 32. * "gain" is the average gain of splits which use the feature 33. * "cover" is the average coverage of splits which use the feature 34. where coverage is defined as the number of samples affected by the split 35. max_num_features : int, default None 36. Maximum number of top features displayed on plot. If None, all features will be displayed. 37. height : float, default 0.2 38. Bar height, passed to ax.barh() 39. xlim : tuple, default None 40. Tuple passed to axes.xlim() 41. ylim : tuple, default None 42. Tuple passed to axes.ylim() 43. title : str, default "Feature importance" 44. Axes title. To disable, pass None. 45. xlabel : str, default "F score" 46. X axis title label. To disable, pass None. 47. ylabel : str, default "Features" 48. Y axis title label. To disable, pass None. 49. show_values : bool, default True 50. Show values on plot. To disable, pass False. 51. kwargs : 52. Other keywords passed to ax.barh() 53. 54. Returns 55. ------- 56. ax : matplotlib Axes 57. """ 58. # TODO: move this to compat.py 59. try: 60. import matplotlib.pyplot as plt 61. except ImportError: 62. raise ImportError('You must install matplotlib to plot importance') 63. 64. if isinstance(booster, XGBModel): 65. importance = booster.get_booster().get_score(importance_type=importance_type) 66. elif isinstance(booster, Booster): 67. importance = booster.get_score(importance_type=importance_type) 68. elif isinstance(booster, dict): 69. importance = booster 70. else: 71. raise ValueError('tree must be Booster, XGBModel or dict instance') 72. 73. if len(importance) == 0: 74. raise ValueError('Booster.get_score() results in empty') 75. 76. tuples = [(k, importance[k]) for k in importance] 77. if max_num_features is not None: 78. tuples = sorted(tuples, key=lambda x: x[1])[-max_num_features:] 79. else: 80. tuples = sorted(tuples, key=lambda x: x[1]) 81. labels, values = zip(*tuples) 82. 83. if ax is None: 84. _, ax = plt.subplots(1, 1) 85. 86. ylocs = np.arange(len(values)) 87. ax.barh(ylocs, values, align='center', height=height, **kwargs) 88. 89. if show_values is True: 90. for x, y in zip(values, ylocs): 91. ax.text(x + 1, y, x, va='center') 92. 93. ax.set_yticks(ylocs) 94. ax.set_yticklabels(labels) 95. 96. if xlim is not None: 97. if not isinstance(xlim, tuple) or len(xlim) != 2: 98. raise ValueError('xlim must be a tuple of 2 elements') 99. else: 100. xlim = (0, max(values) * 1.1) 101. ax.set_xlim(xlim) 102. 103. if ylim is not None: 104. if not isinstance(ylim, tuple) or len(ylim) != 2: 105. raise ValueError('ylim must be a tuple of 2 elements') 106. else: 107. ylim = (-1, len(values)) 108. ax.set_ylim(ylim) 109. 110. if title is not None: 111. ax.set_title(title) 112. if xlabel is not None: 113. ax.set_xlabel(xlabel) 114. if ylabel is not None: 115. ax.set_ylabel(ylabel) 116. ax.grid(grid) 117. return ax 118. 119. 120. _NODEPAT = re.compile(r'(\d+):\[(.+)\]') 121. _LEAFPAT = re.compile(r'(\d+):(leaf=.+)') 122. _EDGEPAT = re.compile(r'yes=(\d+),no=(\d+),missing=(\d+)') 123. _EDGEPAT2 = re.compile(r'yes=(\d+),no=(\d+)') 124. 125. 126. def _parse_node(graph, text, condition_node_params, leaf_node_params): 127. """parse dumped node""" 128. match = _NODEPAT.match(text) 129. if match is not None: 130. node = match.group(1) 131. graph.node(node, label=match.group(2), **condition_node_params) 132. return node 133. match = _LEAFPAT.match(text) 134. if match is not None: 135. node = match.group(1) 136. graph.node(node, label=match.group(2), **leaf_node_params) 137. return node 138. raise ValueError('Unable to parse node: {0}'.format(text)) 139. 140. 141. def _parse_edge(graph, node, text, yes_color='#0000FF', no_color='#FF0000'): 142. """parse dumped edge""" 143. try: 144. match = _EDGEPAT.match(text) 145. if match is not None: 146. yes, no, missing = match.groups() 147. if yes == missing: 148. graph.edge(node, yes, label='yes, missing', color=yes_color) 149. graph.edge(node, no, label='no', color=no_color) 150. else: 151. graph.edge(node, yes, label='yes', color=yes_color) 152. graph.edge(node, no, label='no, missing', color=no_color) 153. return 154. except ValueError: 155. pass 156. match = _EDGEPAT2.match(text) 157. if match is not None: 158. yes, no = match.groups() 159. graph.edge(node, yes, label='yes', color=yes_color) 160. graph.edge(node, no, label='no', color=no_color) 161. return 162. raise ValueError('Unable to parse edge: {0}'.format(text)) 163. 164. 165. def to_graphviz(booster, fmap='', num_trees=0, rankdir='UT', 166. yes_color='#0000FF', no_color='#FF0000', 167. condition_node_params=None, leaf_node_params=None, **kwargs): 168. """Convert specified tree to graphviz instance. IPython can automatically plot the 169. returned graphiz instance. Otherwise, you should call .render() method 170. of the returned graphiz instance. 171. 172. Parameters 173. ---------- 174. booster : Booster, XGBModel 175. Booster or XGBModel instance 176. fmap: str (optional) 177. The name of feature map file 178. num_trees : int, default 0 179. Specify the ordinal number of target tree 180. rankdir : str, default "UT" 181. Passed to graphiz via graph_attr 182. yes_color : str, default '#0000FF' 183. Edge color when meets the node condition. 184. no_color : str, default '#FF0000' 185. Edge color when doesn't meet the node condition. 186. condition_node_params : dict (optional) 187. condition node configuration, 188. {'shape':'box', 189. 'style':'filled,rounded', 190. 'fillcolor':'#78bceb' 191. } 192. leaf_node_params : dict (optional) 193. leaf node configuration 194. {'shape':'box', 195. 'style':'filled', 196. 'fillcolor':'#e48038' 197. } 198. kwargs : 199. Other keywords passed to graphviz graph_attr 200. 201. Returns 202. ------- 203. ax : matplotlib Axes 204. """ 205. 206. if condition_node_params is None: 207. condition_node_params = {} 208. if leaf_node_params is None: 209. leaf_node_params = {} 210. 211. try: 212. from graphviz import Digraph 213. except ImportError: 214. raise ImportError('You must install graphviz to plot tree') 215. 216. if not isinstance(booster, (Booster, XGBModel)): 217. raise ValueError('booster must be Booster or XGBModel instance') 218. 219. if isinstance(booster, XGBModel): 220. booster = booster.get_booster() 221. 222. tree = booster.get_dump(fmap=fmap)[num_trees] 223. tree = tree.split() 224. 225. kwargs = kwargs.copy() 226. kwargs.update({'rankdir': rankdir}) 227. graph = Digraph(graph_attr=kwargs) 228. 229. for i, text in enumerate(tree): 230. if text[0].isdigit(): 231. node = _parse_node( 232. graph, text, condition_node_params=condition_node_params, 233. leaf_node_params=leaf_node_params) 234. else: 235. if i == 0: 236. # 1st string must be node 237. raise ValueError('Unable to parse given string as tree') 238. _parse_edge(graph, node, text, yes_color=yes_color, 239. no_color=no_color) 240. 241. return graph 242. 243. 244. def plot_tree(booster, fmap='', num_trees=0, rankdir='UT', ax=None, **kwargs): 245. """Plot specified tree. 246. 247. Parameters 248. ---------- 249. booster : Booster, XGBModel 250. Booster or XGBModel instance 251. fmap: str (optional) 252. The name of feature map file 253. num_trees : int, default 0 254. Specify the ordinal number of target tree 255. rankdir : str, default "UT" 256. Passed to graphiz via graph_attr 257. ax : matplotlib Axes, default None 258. Target axes instance. If None, new figure and axes will be created. 259. kwargs : 260. Other keywords passed to to_graphviz 261. 262. Returns 263. ------- 264. ax : matplotlib Axes 265. 266. """ 267. 268. try: 269. import matplotlib.pyplot as plt 270. import matplotlib.image as image 271. except ImportError: 272. raise ImportError('You must install matplotlib to plot tree') 273. 274. if ax is None: 275. _, ax = plt.subplots(1, 1) 276. 277. g = to_graphviz(booster, fmap=fmap, num_trees=num_trees, 278. rankdir=rankdir, **kwargs) 279. 280. s = BytesIO() 281. s.write(g.pipe(format='png')) 282. s.seek(0) 283. img = image.imread(s) 284. 285. ax.imshow(img) 286. ax.axis('off') 287. return ax