ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略

简介: ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略

sklearn.metrics中常用的函数参数


confusion_matrix函数解释


返回值:混淆矩阵,其第i行和第j列条目表示真实标签为第i类、预测标签为第j类的样本数。


                                            预测

                      0                     1

真实     0    

   1    


def confusion_matrix Found at: sklearn.metrics._classification

@_deprecate_positional_args

def confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None,  normalize=None):

   """Compute confusion matrix to evaluate the accuracy of a classification.

 

   By definition a confusion matrix :math:`C` is such that :math:`C_{i, j}` is equal to the number of observations known to be in group :math:`i` and predicted to be in group :math:`j`.

 

   Thus in binary classification, the count of true negatives is

   :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is

   :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.

 

   Read more in the :ref:`User Guide <confusion_matrix>`.

 

   Parameters

   ----------

   y_true : array-like of shape (n_samples,) Ground truth (correct) target values.

   y_pred : array-like of shape (n_samples,) Estimated targets as returned by a classifier.

   labels : array-like of shape (n_classes), default=None.  List of labels to index the matrix. This may be used to reorder

   or select a subset of labels.  If ``None`` is given, those that appear at least once in ``y_true`` or ``y_pred`` are used in sorted order.

 

   sample_weight : array-like of shape (n_samples,), default=None. Sample weights.

 

   .. versionadded:: 0.18

 

   normalize : {'true', 'pred', 'all'}, default=None. Normalizes confusion matrix over the true (rows), predicted (columns)

   conditions or all the population. If None, confusion matrix will not be normalized.

 

   Returns

   -------

   C : ndarray of shape (n_classes, n_classes)

   Confusion matrix whose i-th row and j-th column entry indicates the number of samples with true label being i-th class and prediced label being j-th class.

 

   References

   ----------

   .. [1] `Wikipedia entry for the Confusion matrix <https://en.wikipedia.org/wiki/Confusion_matrix>`_  (Wikipedia and other references may use a different convention for axes)

在:sklear. metrics._classification找到的def confusion_matrix

@_deprecate_positional_args

defconfusion_matrix (y_true, y_pred, *, label =None, sample_weight=None, normalize= None):

计算混淆矩阵来评估分类的准确性。


根据定义,一个混淆矩阵:math: ' C '是这样的:math: ' C_{i, j} '等于已知在:math: ' i '组和预测在:math: ' j '组的观测数。


因此,在二元分类法中,true negatives的数量是

   :math:`C_{0,0}`, false negatives is :math:`C_{1,0}`, true positives is

   :math:`C_{1,1}` and false positives is :math:`C_{0,1}`.


更多信息见:ref: ' User Guide <confusion_matrix> '。</confusion_matrix>


参数

----------

y_true:类数组形状(n_samples,) Ground truth (correct)目标值。

y_pred:分类器返回的估计目标的类数组形状(n_samples,)。

标签:类数组形状(n_classes),默认=无。索引矩阵的标签列表。这可以用于重新排序

或者选择标签的子集。如果给出了' ' None ' ',则在' ' y_true ' '或' ' y_pred ' '中至少出现一次的值将按排序顺序使用。


sample_weight:类似数组的形状(n_samples,),默认=None。样本权重。


. .versionadded:: 0.18


{'true', 'pred', 'all'}, default=None。对真实(行)、预测(列)的混淆矩阵进行规范化

条件或所有的人口。如果没有,混淆矩阵将不会被标准化。


返回

-------

C:形状的ndarray (n_classes, n_classes)

第i行和第j列项表示真标签样本个数为第i类,谓词标签样本个数为第j类的混淆矩阵。

 

引用

----------

. .[1] '用于混淆矩阵的维基百科条目<https: en.wikipedia.org="" wiki="" confusion_matrix=""> ' _(维基百科和其他引用可能对轴使用不同的约定)</https:>

 Examples

   --------

   >>> from sklearn.metrics import confusion_matrix

   >>> y_true = [2, 0, 2, 2, 0, 1]

   >>> y_pred = [0, 0, 2, 2, 0, 2]

   >>> confusion_matrix(y_true, y_pred)

   array([[2, 0, 0],

   [0, 0, 1],

   [1, 0, 2]])

 

   >>> y_true = ["cat", "ant", "cat", "cat", "ant", "bird"]

   >>> y_pred = ["ant", "ant", "cat", "cat", "ant", "cat"]

   >>> confusion_matrix(y_true, y_pred, labels=["ant", "bird", "cat"])

   array([[2, 0, 0],

   [0, 0, 1],

   [1, 0, 2]])

 

   In the binary case, we can extract true positives, etc as follows:

 

   >>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()

   >>> (tn, fp, fn, tp)

   (0, 2, 1, 1)  

   """

   y_type, y_true, y_pred = _check_targets(y_true, y_pred)

   if y_type not in ("binary", "multiclass"):

       raise ValueError("%s is not supported" % y_type)

   if labels is None:

       labels = unique_labels(y_true, y_pred)

   else:

       labels = np.asarray(labels)

       n_labels = labels.size

       if n_labels == 0:

           raise ValueError("'labels' should contains at least one label.")

       elif y_true.size == 0:

           return np.zeros((n_labels, n_labels), dtype=np.int)

       elif np.all([l not in y_true for l in labels]):

           raise ValueError("At least one label specified must be in y_true")

   if sample_weight is None:

       sample_weight = np.ones(y_true.shape[0], dtype=np.int64)

   else:

       sample_weight = np.asarray(sample_weight)

   check_consistent_length(y_true, y_pred, sample_weight)

   if normalize not in ['true', 'pred', 'all', None]:

       raise ValueError("normalize must be one of {'true', 'pred', "

           "'all', None}")

   n_labels = labels.size

   label_to_ind = {y:x for x, y in enumerate(labels)}

   # convert yt, yp into index

   y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])

   y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])

   # intersect y_pred, y_true with labels, eliminate items not in labels

   ind = np.logical_and(y_pred < n_labels, y_true < n_labels)

   y_pred = y_pred[ind]

   y_true = y_true[ind] # also eliminate weights of eliminated items

   sample_weight = sample_weight[ind]

   # Choose the accumulator dtype to always have high precision

   if sample_weight.dtype.kind in {'i', 'u', 'b'}:

       dtype = np.int64

   else:

       dtype = np.float64

   cm = coo_matrix((sample_weight, (y_true, y_pred)), shape=(n_labels,

    n_labels), dtype=dtype).toarray()

   with np.errstate(all='ignore'):

       if normalize == 'true':

           cm = cm / cm.sum(axis=1, keepdims=True)

       elif normalize == 'pred':

           cm = cm / cm.sum(axis=0, keepdims=True)

       elif normalize == 'all':

           cm = cm / cm.sum()

       cm = np.nan_to_num(cm)

   return cm  





相关文章
|
8月前
|
机器学习/深度学习 算法 Python
【Python机器学习】Sklearn库中Kmeans类、超参数K值确定、特征归一化的讲解(图文解释)
【Python机器学习】Sklearn库中Kmeans类、超参数K值确定、特征归一化的讲解(图文解释)
436 0
|
机器学习/深度学习 存储 缓存
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略
|
数据可视化 Python
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
sklearn之XGBModel:XGBModel之feature_importances_、plot_importance的简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
sklearn:sklearn.preprocessing中的Standardization、Scaling、 Normalization简介、使用方法之详细攻略
|
机器学习/深度学习 数据挖掘 索引
ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略
ML之sklearn:sklearn.metrics中常用的函数参数(比如confusion_matrix等 )解释及其用法说明之详细攻略
|
存储 计算机视觉 索引
sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略
sklearn:sklearn.GridSearchCV函数的简介、使用方法之详细攻略
sklearn之pipeline:sklearn.pipeline函数使用及其参数解释之详细攻略
sklearn之pipeline:sklearn.pipeline函数使用及其参数解释之详细攻略
|
计算机视觉 索引 Python
ML之sklearn:sklearn库中的ShuffleSplit()函数和StratifiedShuffleSplit()函数的讲解
ML之sklearn:sklearn库中的ShuffleSplit()函数和StratifiedShuffleSplit()函数的讲解
|
机器学习/深度学习 存储 缓存
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略(一)
ML之sklearn:sklearn的make_pipeline函数、RobustScaler函数、KFold函数、cross_val_score函数的代码解释、使用方法之详细攻略