sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略(二)

简介: sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略

2、L1-based feature selection


>>> from sklearn.svm import LinearSVC

>>> from sklearn.datasets import load_iris

>>> from sklearn.feature_selection import SelectFromModel

>>> X, y = load_iris(return_X_y=True)

>>> X.shape

(150, 4)

>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)

>>> model = SelectFromModel(lsvc, prefit=True)

>>> X_new = model.transform(X)

>>> X_new.shape

(150, 3)


3、Tree-based feature selection


>>> from sklearn.ensemble import ExtraTreesClassifier

>>> from sklearn.datasets import load_iris

>>> from sklearn.feature_selection import SelectFromModel

>>> X, y = load_iris(return_X_y=True)

>>> X.shape

(150, 4)

>>> clf = ExtraTreesClassifier(n_estimators=50)

>>> clf = clf.fit(X, y)

>>> clf.feature_importances_  

array([ 0.04...,  0.05...,  0.4...,  0.4...])

>>> model = SelectFromModel(clf, prefit=True)

>>> X_new = model.transform(X)

>>> X_new.shape              

(150, 2)



SelectFromModel函数的使用方法


1、SelectFromModel的原生代码

class SelectFromModel Found at: sklearn.feature_selection.from_model

class SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):

   """Meta-transformer for selecting features based on importance weights.

 

   .. versionadded:: 0.17

 

   Parameters

   ----------

   estimator : object

   The base estimator from which the transformer is built.

   This can be both a fitted (if ``prefit`` is set to True)

   or a non-fitted estimator. The estimator must have either a

   ``feature_importances_`` or ``coef_`` attribute after fitting.

 

   threshold : string, float, optional default None

   The threshold value to use for feature selection. Features whose

   importance is greater or equal are kept while the others are

   discarded. If "median" (resp. "mean"), then the ``threshold`` value is

   the median (resp. the mean) of the feature importances. A scaling

   factor (e.g., "1.25*mean") may also be used. If None and if the

   estimator has a parameter penalty set to l1, either explicitly

   or implicitly (e.g, Lasso), the threshold used is 1e-5.

   Otherwise, "mean" is used by default.

 

   prefit : bool, default False

   Whether a prefit model is expected to be passed into the constructor

   directly or not. If True, ``transform`` must be called directly

   and SelectFromModel cannot be used with ``cross_val_score``,

   ``GridSearchCV`` and similar utilities that clone the estimator.

   Otherwise train the model using ``fit`` and then ``transform`` to do

   feature selection.

 

   norm_order : non-zero int, inf, -inf, default 1

   Order of the norm used to filter the vectors of coefficients below

   ``threshold`` in the case where the ``coef_`` attribute of the

   estimator is of dimension 2.

 

   Attributes

   ----------

   estimator_ : an estimator

   The base estimator from which the transformer is built.

   This is stored only when a non-fitted estimator is passed to the

   ``SelectFromModel``, i.e when prefit is False.

 

   threshold_ : float

   The threshold value used for feature selection.

   """

   def __init__(self, estimator, threshold=None, prefit=False,

    norm_order=1):

       self.estimator = estimator

       self.threshold = threshold

       self.prefit = prefit

       self.norm_order = norm_order

 

   def _get_support_mask(self):

   # SelectFromModel can directly call on transform.

       if self.prefit:

           estimator = self.estimator

       elif hasattr(self, 'estimator_'):

           estimator = self.estimator_

       else:

           raise ValueError(

               'Either fit SelectFromModel before transform or set "prefit='

               'True" and pass a fitted estimator to the constructor.')

       scores = _get_feature_importances(estimator, self.norm_order)

       threshold = _calculate_threshold(estimator, scores, self.threshold)

       return scores >= threshold

 

   def fit(self, X, y=None, **fit_params):

       """Fit the SelectFromModel meta-transformer.

       Parameters

       ----------

       X : array-like of shape (n_samples, n_features)

           The training input samples.

       y : array-like, shape (n_samples,)

           The target values (integers that correspond to classes in

           classification, real numbers in regression).

       **fit_params : Other estimator specific parameters

       Returns

       -------

       self : object

           Returns self.

       """

       if self.prefit:

           raise NotFittedError(

               "Since 'prefit=True', call transform directly")

       self.estimator_ = clone(self.estimator)

       self.estimator_.fit(X, y, **fit_params)

       return self

 

   @property

   def threshold_(self):

       scores = _get_feature_importances(self.estimator_, self.norm_order)

       return _calculate_threshold(self.estimator, scores, self.threshold)

 

   @if_delegate_has_method('estimator')

   def partial_fit(self, X, y=None, **fit_params):

       """Fit the SelectFromModel meta-transformer only once.

       Parameters

       ----------

       X : array-like of shape (n_samples, n_features)

           The training input samples.

       y : array-like, shape (n_samples,)

           The target values (integers that correspond to classes in

           classification, real numbers in regression).

       **fit_params : Other estimator specific parameters

       Returns

       -------

       self : object

           Returns self.

       """

       if self.prefit:

           raise NotFittedError(

               "Since 'prefit=True', call transform directly")

       if not hasattr(self, "estimator_"):

           self.estimator_ = clone(self.estimator)

       self.estimator_.partial_fit(X, y, **fit_params)

       return self



相关文章
|
数据可视化 Python
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
|
存储 计算机视觉 索引
sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略
sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
|
机器学习/深度学习 数据挖掘 索引
ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略
ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略
|
机器学习/深度学习 存储 测试技术
Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略
Python之 sklearn:sklearn中的RobustScaler 函数的简介及使用方法之详细攻略
|
机器学习/深度学习 存储 缓存
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
sklearn之pipeline:sklearn.pipeline函数使用及其参数解释之详细攻略
sklearn之pipeline:sklearn.pipeline函数使用及其参数解释之详细攻略
|
存储 Unix API
sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略(一)
sklearn:sklearn.feature_selection的SelectFromModel函数的简介、使用方法之详细攻略
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略(一)
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略