0. 概念的简要介绍
首先用此章节来对多类、多标签、多输出分类和回归的概念进行总结:
以下内容是为了区分:多类分类、多标签分类、多类多输出分类和多输出回归的区别,在sklearn中,有两个模块来处理以下的所有问题sklearn.multiclass和sklearn.multioutput,其树状结构梳理图如下所示:
几个概念之间的表格区别如下所示:
1. 多类分类
多类分类是具有两个以上类的分类任务。每个样本只能标记为一个类别。
例如,使用从一组水果图像中提取的特征进行分类,其中每个图像可能是橙子、苹果或梨的图像。每个图像都是一个样本,并被标记为 3 个可能的类别之一。多类分类假设每个样本都分配给一个且只有一个标签 - 例如,一个样本不能既是梨又是苹果。
包含两个以上离散值的 1d 或列向量。4 个样本的向量y示例:
>>> import numpy as np >>> y = np.array(['apple', 'pear', 'apple', 'orange']) >>> print(y) ['apple' 'pear' 'apple' 'orange']
OneVsRestClassifier
one-vs-rest策略,也称为one-vs-all。在 OneVsRestClassifier该策略包括为每个类拟合一个分类器。对于每个分类器,该类与所有其他类进行拟合。除了计算效率(只n_classes需要分类器)之外,这种方法的一个优点是它的可解释性。由于每个类由一个且只有一个分类器表示,因此可以通过检查其对应的分类器来获得有关该类的知识。这是最常用的策略,也是一个公平的默认选择。
OneVsOneClassifier
OneVsOneClassifier每对类构造一个分类器。在预测时,选择得票最多的类。如果出现平局(在投票数相等的两个类中),它通过对底层二元分类器计算的成对分类置信度求和来选择具有最高总分类置信度的类。
由于它需要拟合分类器,因此由于其 O(n_classes^2) 复杂度,此方法通常比 one-vs-the-rest 慢。但是,这种方法可能有利于算法,例如不能很好地扩展的内核算法 。这是因为每个单独的学习问题只涉及数据的一小部分,而在 one-vs-the-rest 的情况下,完整的数据集会被使用多次。决策函数是一对一分类单调变换的结果。
2. 多标签分类
多标签分类(与多输出 分类密切相关)是一个分类任务,用m 来自n_classes可能类的标签标记每个样本,其中m可以是 0 到 n_classes包含。这可以被认为是预测样本的不相互排斥的属性。正式地,为每个样本分配一个二进制输出给每个类。正类用 1 表示,负类用 0 或 -1 表示。因此,它可以与运行二进制分类任务相媲美n_classes ,例如使用 MultiOutputClassifier. 这种方法独立处理每个标签,而多标签分类器可以同时处理多个类,考虑它们之间的相关行为。
例如,预测与文本文档或视频相关的主题。文档或视频可以是关于“宗教”、“政治”、“金融”或“教育”之一、几个主题类或所有主题类。
多标签的有效表示是shape y的密集或稀疏 二进制矩阵。每列代表一个类。每行中的1表示样本已标记的正类。3 个样本的密集矩阵示例:(n_samples, n_classes)
>>> y = np.array([[1, 0, 0, 1], [0, 0, 1, 1], [0, 0, 0, 0]]) >>> print(y) [[1 0 0 1] [0 0 1 1] [0 0 0 0]]
3. 多类多输出分类
多类多输出分类 (也称为多任务分类)是一种分类任务,它用一组非二进制 属性标记每个样本。属性的数量和每个属性的类数都大于 2。因此,单个估计器可以处理多个联合分类任务。这既是多标签分类任务的泛化,只考虑二元属性,也是多类分类任务的泛化,只考虑一个属性。
例如,对一组水果图像的属性“水果类型”和“颜色”进行分类。属性“水果类型”有可能的类:“苹果”、“梨”和“橙子”。属性“color”具有可能的类别:“green”、“red”、“yellow”和“orange”。每个样本都是水果的图像,为两个属性输出一个标签,每个标签是相应属性的可能类别之一。
请注意,所有处理多类多输出(也称为多任务分类)任务的分类器都支持多标签分类任务作为特例。多任务分类类似于具有不同模型公式的多输出分类任务。
多输出的有效表示是类标签形状的密集 y矩阵 。一维多类变量的逐列串联 。3 个样本的示例:(n_samples, n_classes)
>>> y = np.array([['apple', 'green'], ['orange', 'orange'], ['pear', 'green']]) >>> print(y) [['apple' 'green'] ['orange' 'orange'] ['pear' 'green']]
4. 多输出回归
多输出回归预测每个样本的多个数值属性。每个属性都是一个数值变量,每个样本要预测的属性数大于或等于 2。一些支持多输出回归的估计器比仅运行n_output 估计器更快。
例如,使用在某个位置获得的数据预测风速和风向(以度为单位)。每个样本将是在一个位置获得的数据,并且将为每个样本输出风速和风向。
多输出的有效表示是浮点 y形状 的密集矩阵。连续变量的逐列串联 。3 个样本的示例:(n_samples, n_output)
>>> y = np.array([[31.4, 94], [40.5, 109], [25.0, 30]]) >>> print(y) [[ 31.4 94. ] [ 40.5 109. ] [ 25. 30. ]]
总结:
对于以上内容应该有个比较仔细的了解,一句话堆这些概念进行说明就是。多类分类就是普通的多分类问题;而多标签分类就是对样本进行多类型的二分类问题;而多类多输出就是对多类型都进行一个多分类问题。多输出回归就比较简单了,就是需要回归几个输出值。
1. Classification测试
import numpy as np import sklearn.metrics from pprint import pprint from sklearn import datasets from sklearn.model_selection import train_test_split from autosklearn.classification import AutoSklearnClassifier
# Data Loading... X, y = sklearn.datasets.load_breast_cancer(return_X_y=True) # X.shape, y.shape: ((569, 30), (569,)) SEED = 42 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED) # test_size: If int, represents the # absolute number of test samples. If None, the value is set to the # complement of the train size. If ``train_size`` is also None, it will # be set to 0.25. # train_size: If int, represents the absolute number of train samples. If None, # the value is automatically set to the complement of the test size X_train.shape, X_test.shape, y_train.shape, y_test.shape
((455, 30), (114, 30), (455,), (114,))
fit的运行过程中可能会出现内存溢出的错误:
[ERROR] [2022-03-05 16:51:41,118:Client-AutoML(1):breast_cancer] Dummy prediction failed with run state StatusType.MEMOUT and additional output: {‘error’: ‘Memout (used more than 3072 MB).’, ‘configuration_origin’: ‘DUMMY’}.
当以下代码出现以上错误时,说明out of memory,也就是内容分配不足,可以在memory_limit中设置的运行内存限制大一点。
automl = autosklearn.classification.AutoSklearnClassifier( ... # default:memory_limit=3072, memory_limit=6144, ... )
而且,如果文件已经存在 同样会报错
FileExistsError: [Errno 17] File exists: ‘./autosklearn_classification_example_tmp’
重新训练时需要把这个文件夹删除,如果没有设置tmp_folder,默认创建为:/tmp/autosklearn_tmp_$pid_$random_number
# Build and fit a classifier... automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=180, # train for 3 minutes per_run_time_limit=30, # the limit ti e for a single machine learning model memory_limit=8192, # memory for the machine learning algorithm tmp_folder='./autosklearn_classification_example_tmp', # folder to store configuration output and log files ) automl.fit(X_train, y_train, dataset_name='breast_cancer')
AutoSklearnClassifier(memory_limit=8192, per_run_time_limit=30, time_left_for_this_task=180, tmp_folder='./autosklearn_classification_example_tmp')
# View the models found by auto-sklearn... print(automl.leaderboard())
rank ensemble_weight type cost duration model_id 54 1 0.08 mlp 0.013245 0.925545 6 2 0.02 mlp 0.019868 0.813444 4 3 0.04 mlp 0.026490 1.093411 46 4 0.04 sgd 0.026490 0.950709 7 5 0.02 extra_trees 0.033113 1.047053 10 6 0.04 gradient_boosting 0.033113 0.852145 21 7 0.12 mlp 0.033113 1.647795 2 8 0.02 random_forest 0.046358 1.156928 53 9 0.04 mlp 0.046358 0.866593 12 10 0.02 gradient_boosting 0.046358 1.059940 14 11 0.04 mlp 0.046358 1.426609 15 12 0.04 mlp 0.046358 2.378096 5 13 0.04 random_forest 0.052980 1.392029 40 14 0.06 lda 0.052980 0.701233 33 15 0.02 mlp 0.052980 1.394135 19 16 0.06 extra_trees 0.059603 2.198166 16 17 0.04 random_forest 0.059603 1.387480 8 18 0.02 random_forest 0.059603 1.359141 11 19 0.02 random_forest 0.066225 2.323715 9 20 0.02 extra_trees 0.066225 1.251749 57 21 0.02 mlp 0.066225 0.806810 42 22 0.02 k_nearest_neighbors 0.079470 0.627965 30 23 0.02 mlp 0.099338 1.350389 20 24 0.02 passive_aggressive 0.099338 0.538802 31 25 0.02 mlp 0.112583 1.907936 38 26 0.02 mlp 0.119205 0.795489 36 27 0.02 mlp 0.125828 0.937948 51 28 0.06 lda 0.139073 1.157996
# Print the final ensemble constructed by auto-sklearn... pprint(automl.show_models(), indent=4)
{ 2: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc91862250>, 'cost': 0.04635761589403975, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc3839ac10>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc91862190>, 'model_id': 2, 'rank': 8, 'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 4: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc38307850>, 'cost': 0.026490066225165587, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdd0115b400>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc383078b0>, 'model_id': 4, 'rank': 3, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.00021148999718383549, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(113, 113, 113), learning_rate_init=0.0007452270241186694, max_iter=64, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 5: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc910d8580>, 'cost': 0.052980132450331174, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc91680be0>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc912b3a60>, 'model_id': 5, 'rank': 13, 'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=3, min_samples_leaf=2, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 6: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc38913370>, 'cost': 0.019867549668874163, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc386a3250>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc38913730>, 'model_id': 6, 'rank': 2, 'sklearn_classifier': MLPClassifier(alpha=0.0017940473175767063, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(101, 101), learning_rate_init=0.0004684917334431039, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}, 7: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdd01155f10>, 'cost': 0.0331125827814569, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc93e00c40>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc3853b160>, 'model_id': 7, 'rank': 5, 'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 8: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90da5790>, 'cost': 0.05960264900662249, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc91288760>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90e3b6d0>, 'model_id': 8, 'rank': 16, 'sklearn_classifier': RandomForestClassifier(max_features=2, min_samples_leaf=2, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 9: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc907eb4c0>, 'cost': 0.06622516556291391, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc90e51f10>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90898fd0>, 'model_id': 9, 'rank': 19, 'sklearn_classifier': ExtraTreesClassifier(max_features=6, min_samples_split=10, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 10: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc940396a0>, 'cost': 0.0331125827814569, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc38692a00>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc940399d0>, 'model_id': 10, 'rank': 6, 'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=True, l2_regularization=0.005326508887463406, learning_rate=0.060800813211425456, max_iter=512, max_leaf_nodes=6, min_samples_leaf=5, n_iter_no_change=5, random_state=1, validation_fraction=None, warm_start=True)}, 11: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90697640>, 'cost': 0.06622516556291391, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc90d61a00>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90697340>, 'model_id': 11, 'rank': 20, 'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=23, min_samples_leaf=7, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 12: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc918473d0>, 'cost': 0.04635761589403975, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc93bae550>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc91847250>, 'model_id': 12, 'rank': 9, 'sklearn_classifier': HistGradientBoostingClassifier(early_stopping=False, l2_regularization=1.0647401999412075e-10, learning_rate=0.08291320147381159, max_iter=512, max_leaf_nodes=39, n_iter_no_change=0, random_state=1, validation_fraction=None, warm_start=True)}, 14: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc913928b0>, 'cost': 0.04635761589403975, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc918d66a0>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc91665fa0>, 'model_id': 14, 'rank': 10, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=2.5550223982458062e-06, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(54, 54, 54), learning_rate_init=0.00027271287919467994, max_iter=256, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 15: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc913921f0>, 'cost': 0.04635761589403975, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc91862e50>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc91392070>, 'model_id': 15, 'rank': 11, 'sklearn_classifier': MLPClassifier(alpha=4.2841884333778574e-06, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(263, 263, 263), learning_rate_init=0.0011804284312897009, max_iter=128, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 16: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90c58880>, 'cost': 0.05960264900662249, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc910d8ee0>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90e12610>, 'model_id': 16, 'rank': 17, 'sklearn_classifier': RandomForestClassifier(criterion='entropy', max_features=3, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 19: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90cf7a00>, 'cost': 0.05960264900662249, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc90f01280>, 'ensemble_weight': 0.06, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90cf7640>, 'model_id': 19, 'rank': 18, 'sklearn_classifier': ExtraTreesClassifier(criterion='entropy', max_features=448, min_samples_leaf=2, min_samples_split=20, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 20: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc9023e610>, 'cost': 0.09933774834437081, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc905ab160>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc9023e490>, 'model_id': 20, 'rank': 23, 'sklearn_classifier': PassiveAggressiveClassifier(C=0.14268277711454813, max_iter=32, random_state=1, tol=0.0002600768160857831, warm_start=True)}, 21: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc93baefa0>, 'cost': 0.0331125827814569, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc37a46af0>, 'ensemble_weight': 0.12, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc93baeee0>, 'model_id': 21, 'rank': 7, 'sklearn_classifier': MLPClassifier(alpha=0.02847755502162456, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(123, 123), learning_rate_init=0.000421568792103947, max_iter=256, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 30: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc901c8c40>, 'cost': 0.09933774834437081, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc909841c0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc901c8940>, 'model_id': 30, 'rank': 24, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=8.05325583028895e-05, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(140, 140), learning_rate_init=0.0005706565389402362, max_iter=128, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 31: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90163940>, 'cost': 0.11258278145695366, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc902a40a0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90163820>, 'model_id': 31, 'rank': 25, 'sklearn_classifier': MLPClassifier(alpha=0.0001363185819149026, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(139, 139, 139), learning_rate_init=0.00018009776276177523, max_iter=256, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 33: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc910d80a0>, 'cost': 0.052980132450331174, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc91392d60>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc91158e20>, 'model_id': 33, 'rank': 14, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.000807743464484268, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(139,), learning_rate_init=0.00021433050558430938, max_iter=256, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 36: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90072730>, 'cost': 0.1258278145695364, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc901d3a60>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc900725b0>, 'model_id': 36, 'rank': 27, 'sklearn_classifier': MLPClassifier(alpha=0.05657753566180125, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(150, 150, 150), learning_rate_init=0.0284552208272282, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}, 38: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc900dc5b0>, 'cost': 0.11920529801324509, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc901c04c0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90169a30>, 'model_id': 38, 'rank': 26, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.03530075517934556, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(151, 151), learning_rate_init=0.012624724152433505, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}, 40: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90fb6490>, 'cost': 0.052980132450331174, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc912abfd0>, 'ensemble_weight': 0.06, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc90fb6310>, 'model_id': 40, 'rank': 15, 'sklearn_classifier': LinearDiscriminantAnalysis(tol=8.850809824093198e-05)}, 42: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90382400>, 'cost': 0.07947019867549665, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc9096c6a0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc9036cdc0>, 'model_id': 42, 'rank': 22, 'sklearn_classifier': KNeighborsClassifier(n_neighbors=27, p=1)}, 46: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc37a46f10>, 'cost': 0.026490066225165587, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc3805a700>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc37a461c0>, 'model_id': 46, 'rank': 4, 'sklearn_classifier': SGDClassifier(alpha=0.0028239629801064844, average=True, epsilon=0.01391093587699247, eta0=0.01, loss='modified_huber', max_iter=128, penalty='l1', random_state=1, tol=0.0005283535863021666, warm_start=True)}, 51: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc8ff7ca90>, 'cost': 0.13907284768211925, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc90163fa0>, 'ensemble_weight': 0.06, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc8ff7c9a0>, 'model_id': 51, 'rank': 28, 'sklearn_classifier': LinearDiscriminantAnalysis(shrinkage=0.2362694848390572, solver='lsqr', tol=4.087618610024571e-05)}, 53: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc912ab940>, 'cost': 0.04635761589403975, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc916655e0>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc912ab850>, 'model_id': 53, 'rank': 12, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.00011205455217546472, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(113, 113), learning_rate_init=0.0010157011622160305, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}, 54: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc93e00fa0>, 'cost': 0.013245033112582738, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc384d8e50>, 'ensemble_weight': 0.08, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc93e002e0>, 'model_id': 54, 'rank': 1, 'sklearn_classifier': MLPClassifier(alpha=0.00016472833354638788, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(113, 113), learning_rate_init=0.0007607734350660931, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}, 57: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7fdc90327940>, 'cost': 0.06622516556291391, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7fdc909420d0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7fdc904db340>, 'model_id': 57, 'rank': 21, 'sklearn_classifier': MLPClassifier(alpha=0.0023369498985981963, beta_1=0.999, beta_2=0.9, early_stopping=True, hidden_layer_sizes=(103, 103), learning_rate_init=0.0004684917334431039, max_iter=32, n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)}}
# Get the Score of the final ensemble... predictions = automl.predict(X_test) print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9824561403508771
correct = np.equal(predictions, y_test).sum() totals = predictions.size score = correct / totals print("correct:{}, totals:{}, scores:{}".format(correct, totals, score))
correct:112, totals:114, scores:0.9824561403508771
查看结果可以发现性能十分强悍了,只错了2个,正确率高达98.2%
2. Regression测试
import numpy as np import sklearn.metrics import matplotlib.pyplot as plt from pprint import pprint from sklearn import datasets from sklearn.model_selection import train_test_split import autosklearn.regression
# Data Loading... X, y = sklearn.datasets.load_diabetes(return_X_y=True) # X.shape, y.shape: ((442, 10), (442,)) SEED = 42 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED) # test_size: If int, represents the # absolute number of test samples. If None, the value is set to the # complement of the train size. If ``train_size`` is also None, it will # be set to 0.25. # train_size: If int, represents the absolute number of train samples. If None, # the value is automatically set to the complement of the test size X_train.shape, X_test.shape, y_train.shape, y_test.shape
((353, 10), (89, 10), (353,), (89,))
# Build and fit a classifier... automl = autosklearn.regression.AutoSklearnRegressor( time_left_for_this_task=180, # train for 3 minutes per_run_time_limit=30, # the limit ti e for a single machine learning model memory_limit=8192, # memory for the machine learning algorithm tmp_folder='./autosklearn_regression_example_tmp', # folder to store configuration output and log files ) automl.fit(X_train, y_train, dataset_name='diabetes')
AutoSklearnRegressor(memory_limit=8192, per_run_time_limit=30, time_left_for_this_task=180, tmp_folder='./autosklearn_regression_example_tmp')
# View the models found by auto-sklearn... print(automl.leaderboard())
rank ensemble_weight type cost duration model_id 59 1 0.52 libsvm_svr 0.496368 0.820438 62 2 0.14 ard_regression 0.503867 0.474854 34 3 0.04 liblinear_svr 0.506597 0.465134 5 4 0.04 gaussian_process 0.571439 11.650054 22 5 0.14 libsvm_svr 0.580072 0.481025 29 6 0.10 gaussian_process 0.596072 0.694429 36 7 0.02 liblinear_svr 0.680804 0.522072
# Print the final ensemble constructed by auto-sklearn... pprint(automl.show_models(), indent=4)
{ 5: { 'cost': 0.5714392217171937, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f4355a10730>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f4355ab9d90>, 'model_id': 5, 'rank': 4, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f4355ab9f70>, 'sklearn_regressor': GaussianProcessRegressor(alpha=0.283161627129086, kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), n_restarts_optimizer=10, normalize_y=True, random_state=1)}, 22: { 'cost': 0.5800720723074761, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f436cd95b80>, 'ensemble_weight': 0.14, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f435597b130>, 'model_id': 22, 'rank': 5, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f435597b820>, 'sklearn_regressor': SVR(C=1.4272136443763257, cache_size=5357.580729166667, coef0=0.2694141260648879, degree=2, epsilon=0.10000000000000006, gamma=0.05757315877344016, kernel='poly', shrinking=False, tol=0.0010000000000000002, verbose=0)}, 29: { 'cost': 0.596072394456454, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f428a455640>, 'ensemble_weight': 0.1, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f428a40e310>, 'model_id': 29, 'rank': 6, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f428a40e790>, 'sklearn_regressor': GaussianProcessRegressor(alpha=0.22788692419220857, kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1]), n_restarts_optimizer=10, normalize_y=True, random_state=1)}, 34: { 'cost': 0.5065968734118893, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f428a1b7940>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f43558ff490>, 'model_id': 34, 'rank': 3, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f43558ffd30>, 'sklearn_regressor': LinearSVR(C=25232.12061129609, dual=False, epsilon=0.002019395600869544, loss='squared_epsilon_insensitive', random_state=1, tol=0.009223250275815446)}, 36: { 'cost': 0.6808038917513319, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f43558b9d30>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f43559bc040>, 'model_id': 36, 'rank': 7, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f4355840070>, 'sklearn_regressor': LinearSVR(C=113.58659319519185, dual=False, epsilon=0.953621220533319, loss='squared_epsilon_insensitive', random_state=1, tol=0.006172262678900209)}, 59: { 'cost': 0.496368425910942, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f428a3fe490>, 'ensemble_weight': 0.52, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f428a3dcc10>, 'model_id': 59, 'rank': 1, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f428a3dcd00>, 'sklearn_regressor': SVR(C=0.8411452049277826, cache_size=5349.908854166667, coef0=0.028890874524519994, epsilon=0.00577061356609876, gamma=0.1, kernel='sigmoid', tol=0.0006935969948540294, verbose=0)}, 62: { 'cost': 0.5038673768126611, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f4289fed160>, 'ensemble_weight': 0.14, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f4355a10550>, 'model_id': 62, 'rank': 2, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f4355a10e20>, 'sklearn_regressor': ARDRegression(alpha_1=0.0009920132163129295, alpha_2=1.7797740837908024e-05, copy_X=False, lambda_1=4.023304088550062e-09, lambda_2=3.759668315507968e-08, threshold_lambda=72842.75949581455, tol=0.0667287949732316)}}
# Get the Score of the final ensemble... train_predictions = automl.predict(X_train) print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions)) test_predictions = automl.predict(X_test) print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
Train R2 score: 0.5538993106240657 Test R2 score: 0.48404488514413047
# Plot the predictions... plt.scatter(train_predictions, y_train, label="Train samples", c='#d95f02') plt.scatter(test_predictions, y_test, label="Test samples", c='#7570b3') plt.xlabel("Predicted value") plt.ylabel("True value") plt.legend() # 偏离直线说明预测得不好,反之接近直线说明预测得好 plt.plot([30, 400], [30, 400], c='k', zorder=0) plt.xlim([30, 400]) plt.ylim([30, 400]) plt.tight_layout() plt.show()
3. Multi-label Classification测试
import numpy as np import sklearn.datasets import sklearn.metrics import autosklearn.classification from pprint import pprint from sklearn.utils.multiclass import type_of_target from sklearn.model_selection import train_test_split
# Data Loading... # Using reuters multilabel dataset -- https://www.openml.org/d/40594 X, y = sklearn.datasets.fetch_openml(data_id=40594, return_X_y=True, as_frame=False) # X.shape, y.shape: ((2000, 243), (2000, 7)) # About sklearn.datasets.fetch_openml Parameters: # 1.data_id : int, default=None # OpenML ID of the dataset. The most specific way of retrieving a # dataset. If data_id is not given, name (and potential version) are # used to obtain a dataset. # 2. return_X_y : bool, default=False # If True, returns ``(data, target)`` instead of a Bunch object. # 3. as_frame : bool or 'auto', default='auto' # If True, the data is a pandas DataFrame # Convert the label int format: True -> 1 / False -> 0 y[y == 'TRUE'] = 1 y[y == 'FALSE'] = 0 y = y.astype(int) # make sure properly formatted print(f"type_of_target={type_of_target(y)}") SEED = 42 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED) # test_size: If int, represents the # absolute number of test samples. If None, the value is set to the # complement of the train size. If ``train_size`` is also None, it will # be set to 0.25. # train_size: If int, represents the absolute number of train samples. If None, # the value is automatically set to the complement of the test size X_train.shape, X_test.shape, y_train.shape, y_test.shape
type_of_target=multilabel-indicator ((1600, 243), (400, 243), (1600, 7), (400, 7))
# Building the classifier... automl = autosklearn.classification.AutoSklearnClassifier( time_left_for_this_task=180, # train for 3 minutes per_run_time_limit=30, # the limit time for a single machine learning model initial_configurations_via_metalearning=0, # start from scratch memory_limit=8192, # memory for the machine learning algorithm # smac_scenario_args={'runcount_limit': 1}, # limit the model run(only run one mddel here) tmp_folder='./autosklearn_multi_classification_example_tmp', # folder to store configuration output and log files ) # Using reuters multilabel dataset -- https://www.openml.org/d/40594 # the name of dataset is reuters automl.fit(X_train, y_train, dataset_name='reuters')
AutoSklearnClassifier(initial_configurations_via_metalearning=0, memory_limit=8192, per_run_time_limit=30, time_left_for_this_task=180, tmp_folder='./autosklearn_multi_classification_example_tmp')
# View the models found by auto-sklearn... print(automl.leaderboard())
rank ensemble_weight type cost duration model_id 31 1 0.34 k_nearest_neighbors 0.398509 3.560389 18 2 0.18 gaussian_nb 0.461616 0.523785 11 3 0.18 gaussian_nb 0.488684 0.543580 2 4 0.02 random_forest 0.489481 2.743526 9 5 0.08 bernoulli_nb 0.513874 3.612643 8 6 0.02 mlp 0.515206 3.365863 23 7 0.04 gaussian_nb 0.540634 0.765986 25 8 0.04 bernoulli_nb 0.547112 0.543565 10 9 0.02 multinomial_nb 0.577200 1.838750 21 10 0.08 gaussian_nb 0.599070 0.575137
# Print the final ensemble constructed by auto-sklearn... pprint(automl.show_models(), indent=4)
{ 2: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4b9da60>, 'cost': 0.48948102811225125, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9aeacd01c0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4b9d0a0>, 'model_id': 2, 'rank': 4, 'sklearn_classifier': RandomForestClassifier(max_features=15, n_estimators=512, n_jobs=1, random_state=1, warm_start=True)}, 8: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4a200a0>, 'cost': 0.5152055408242915, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4a00760>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4a20280>, 'model_id': 8, 'rank': 6, 'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0029548979739433792, beta_1=0.999, beta_2=0.9, hidden_layer_sizes=(31, 31), learning_rate_init=0.00022421940958541154, max_iter=256, n_iter_no_change=32, random_state=1, validation_fraction=0.0, verbose=0, warm_start=True)}, 9: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4a60f10>, 'cost': 0.5138737715667154, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4ba8400>, 'ensemble_weight': 0.08, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4a60340>, 'model_id': 9, 'rank': 5, 'sklearn_classifier': OneVsRestClassifier(estimator=BernoulliNB(alpha=0.3379748507977488), n_jobs=1)}, 10: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4c3f160>, 'cost': 0.5772002498640842, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4a95ee0>, 'ensemble_weight': 0.02, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4a76400>, 'model_id': 10, 'rank': 9, 'sklearn_classifier': OneVsRestClassifier(estimator=MultinomialNB(alpha=4.603485200325942), n_jobs=1)}, 11: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4a00c40>, 'cost': 0.48868433214909013, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4b04580>, 'ensemble_weight': 0.18, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4a00580>, 'model_id': 11, 'rank': 3, 'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)}, 18: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9aeb297ca0>, 'cost': 0.46161569155493276, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9ae8df3310>, 'ensemble_weight': 0.18, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9aeb2972e0>, 'model_id': 18, 'rank': 2, 'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)}, 21: { 'balancing': Balancing(random_state=1, strategy='weighting'), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc440aeb0>, 'cost': 0.5990703229537311, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc49eb190>, 'ensemble_weight': 0.08, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4b785e0>, 'model_id': 21, 'rank': 10, 'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)}, 23: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4b3f4f0>, 'cost': 0.540634450557514, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4b82a90>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4b3f250>, 'model_id': 23, 'rank': 7, 'sklearn_classifier': OneVsRestClassifier(estimator=GaussianNB(), n_jobs=1)}, 25: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9bc4bf13d0>, 'cost': 0.5471123873198415, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9bc4a60e50>, 'ensemble_weight': 0.04, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9bc4bf19a0>, 'model_id': 25, 'rank': 8, 'sklearn_classifier': OneVsRestClassifier(estimator=BernoulliNB(alpha=0.1588461793645986), n_jobs=1)}, 31: { 'balancing': Balancing(random_state=1), 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f9ae8df3820>, 'cost': 0.3985087938580333, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f9ae8df36d0>, 'ensemble_weight': 0.34, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f9ae8df38b0>, 'model_id': 31, 'rank': 1, 'sklearn_classifier': OneVsRestClassifier(estimator=KNeighborsClassifier(n_neighbors=1, p=1), n_jobs=1)}}
# Print statistics about the auto-sklearn run such as number of # iterations, number of models failed with a time out. print(automl.sprint_statistics())
auto-sklearn results: Dataset name: reuters Metric: f1_macro Best validation score: 0.601491 Number of target algorithm runs: 32 Number of successful target algorithm runs: 24 Number of crashed target algorithm runs: 3 Number of target algorithms that exceeded the time limit: 3 Number of target algorithms that exceeded the memory limit: 2
# Get the Score of the final ensemble... predictions = automl.predict(X_test) print("Accuracy score", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score 0.6025
4. Multi-output Regression测试
import numpy as np import sklearn.metrics import matplotlib.pyplot as plt from pprint import pprint from sklearn import datasets from sklearn.model_selection import train_test_split import autosklearn.regression
((800, 10), (200, 10), (800, 3), (200, 3))
# Data Loading... # 自定义一个多回归的数据集 X, y = sklearn.datasets.make_regression( n_samples=1000, n_features=10, n_informative=5, n_targets=3 ) # n_samples : The number of samples # n_features : The number of features. # n_informative : The number of informative features # n_targets : The number of regression targets # Read more Parameters use: help(sklearn.datasets.make_regression) # X.shape, y.shape: ((1000, 10), (1000, 3)) SEED = 42 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=SEED) # test_size: If int, represents the # absolute number of test samples. If None, the value is set to the # complement of the train size. If ``train_size`` is also None, it will # be set to 0.25. # train_size: If int, represents the absolute number of train samples. If None, # the value is automatically set to the complement of the test size X_train.shape, X_test.shape, y_train.shape, y_test.shape
# Build and fit a regressor... automl = autosklearn.regression.AutoSklearnRegressor( time_left_for_this_task=180, # train for 3 minutes per_run_time_limit=30, # the limit time for a single machine learning model memory_limit=8192, # memory for the machine learning algorithm tmp_folder='./autosklearn_multi_regression_example_tmp', # folder to store configuration output and log files ) automl.fit(X_train, y_train, dataset_name='synthetic') # note that:自定义的数据集使用的dataset_name名字是'synthetic'
[WARNING] [2022-03-05 21:31:30,473:Client-AutoMLSMBO(1)::synthetic] Could not find meta-data directory /home/fs/anaconda3/envs/automl/lib/python3.9/site-packages/autosklearn/metalearning/files/r2_multioutput.regression_dense AutoSklearnRegressor(memory_limit=8192, per_run_time_limit=30, time_left_for_this_task=180, tmp_folder='./autosklearn_regression_example_tmp')
# View the models found by auto-sklearn... print(automl.leaderboard())
rank ensemble_weight type cost duration model_id 22 1 1.0 gaussian_process 1.211008e-09 4.055037
# Print the final ensemble constructed by auto-sklearn... pprint(automl.show_models(), indent=4)
{ 22: { 'cost': 1.2110078495553012e-09, 'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f62ad156d00>, 'ensemble_weight': 1.0, 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f62ad266250>, 'model_id': 22, 'rank': 1, 'regressor': <autosklearn.pipeline.components.regression.RegressorChoice object at 0x7f62ad1543d0>, 'sklearn_regressor': GaussianProcessRegressor(alpha=1.4980082486136626e-11, kernel=RBF(length_scale=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), n_restarts_optimizer=10, normalize_y=True, random_state=1)}}
# Get the Score of the final ensemble... train_predictions = automl.predict(X_train) print("Train R2 score:", sklearn.metrics.r2_score(y_train, train_predictions)) test_predictions = automl.predict(X_test) print("Test R2 score:", sklearn.metrics.r2_score(y_test, test_predictions))
Train R2 score: 0.9999999996059499 Test R2 score: 0.9999999995397361
这个结果,有点离谱,已经100%拟合数据集了,可能利用模型随机初始化的数据集与真实的数据集还是相差比较大的,确实了一点真实数据的分布
# Get the configuration space... print(automl.get_configuration_space(X_train, y_train))
Configuration space object: Hyperparameters: data_preprocessor:__choice__, Type: Categorical, Choices: {feature_type}, Default: feature_type data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__, Type: Categorical, Choices: {encoding, no_encoding, one_hot_encoding}, Default: one_hot_encoding data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__, Type: Categorical, Choices: {minority_coalescer, no_coalescense}, Default: minority_coalescer data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction, Type: UniformFloat, Range: [0.0001, 0.5], Default: 0.01, on log-scale data_preprocessor:feature_type:numerical_transformer:imputation:strategy, Type: Categorical, Choices: {mean, median, most_frequent}, Default: mean data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__, Type: Categorical, Choices: {minmax, none, normalize, power_transformer, quantile_transformer, robust_scaler, standardize}, Default: standardize data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles, Type: UniformInteger, Range: [10, 2000], Default: 1000 data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution, Type: Categorical, Choices: {uniform, normal}, Default: uniform data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_max, Type: UniformFloat, Range: [0.7, 0.999], Default: 0.75 data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_min, Type: UniformFloat, Range: [0.001, 0.3], Default: 0.25 feature_preprocessor:__choice__, Type: Categorical, Choices: {extra_trees_preproc_for_regression, fast_ica, feature_agglomeration, kernel_pca, kitchen_sinks, no_preprocessing, nystroem_sampler, pca, polynomial, random_trees_embedding}, Default: no_preprocessing feature_preprocessor:extra_trees_preproc_for_regression:bootstrap, Type: Categorical, Choices: {True, False}, Default: False feature_preprocessor:extra_trees_preproc_for_regression:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse feature_preprocessor:extra_trees_preproc_for_regression:max_depth, Type: Constant, Value: None feature_preprocessor:extra_trees_preproc_for_regression:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0 feature_preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes, Type: Constant, Value: None feature_preprocessor:extra_trees_preproc_for_regression:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1 feature_preprocessor:extra_trees_preproc_for_regression:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2 feature_preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf, Type: Constant, Value: 0.0 feature_preprocessor:extra_trees_preproc_for_regression:n_estimators, Type: Constant, Value: 100 feature_preprocessor:fast_ica:algorithm, Type: Categorical, Choices: {parallel, deflation}, Default: parallel feature_preprocessor:fast_ica:fun, Type: Categorical, Choices: {logcosh, exp, cube}, Default: logcosh feature_preprocessor:fast_ica:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100 feature_preprocessor:fast_ica:whiten, Type: Categorical, Choices: {False, True}, Default: False feature_preprocessor:feature_agglomeration:affinity, Type: Categorical, Choices: {euclidean, manhattan, cosine}, Default: euclidean feature_preprocessor:feature_agglomeration:linkage, Type: Categorical, Choices: {ward, complete, average}, Default: ward feature_preprocessor:feature_agglomeration:n_clusters, Type: UniformInteger, Range: [2, 400], Default: 25 feature_preprocessor:feature_agglomeration:pooling_func, Type: Categorical, Choices: {mean, median, max}, Default: mean feature_preprocessor:kernel_pca:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0 feature_preprocessor:kernel_pca:degree, Type: UniformInteger, Range: [2, 5], Default: 3 feature_preprocessor:kernel_pca:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.01, on log-scale feature_preprocessor:kernel_pca:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf feature_preprocessor:kernel_pca:n_components, Type: UniformInteger, Range: [10, 2000], Default: 100 feature_preprocessor:kitchen_sinks:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 1.0, on log-scale feature_preprocessor:kitchen_sinks:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale feature_preprocessor:nystroem_sampler:coef0, Type: UniformFloat, Range: [-1.0, 1.0], Default: 0.0 feature_preprocessor:nystroem_sampler:degree, Type: UniformInteger, Range: [2, 5], Default: 3 feature_preprocessor:nystroem_sampler:gamma, Type: UniformFloat, Range: [3.0517578125e-05, 8.0], Default: 0.1, on log-scale feature_preprocessor:nystroem_sampler:kernel, Type: Categorical, Choices: {poly, rbf, sigmoid, cosine}, Default: rbf feature_preprocessor:nystroem_sampler:n_components, Type: UniformInteger, Range: [50, 10000], Default: 100, on log-scale feature_preprocessor:pca:keep_variance, Type: UniformFloat, Range: [0.5, 0.9999], Default: 0.9999 feature_preprocessor:pca:whiten, Type: Categorical, Choices: {False, True}, Default: False feature_preprocessor:polynomial:degree, Type: UniformInteger, Range: [2, 3], Default: 2 feature_preprocessor:polynomial:include_bias, Type: Categorical, Choices: {True, False}, Default: True feature_preprocessor:polynomial:interaction_only, Type: Categorical, Choices: {False, True}, Default: False feature_preprocessor:random_trees_embedding:bootstrap, Type: Categorical, Choices: {True, False}, Default: True feature_preprocessor:random_trees_embedding:max_depth, Type: UniformInteger, Range: [2, 10], Default: 5 feature_preprocessor:random_trees_embedding:max_leaf_nodes, Type: Constant, Value: None feature_preprocessor:random_trees_embedding:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1 feature_preprocessor:random_trees_embedding:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2 feature_preprocessor:random_trees_embedding:min_weight_fraction_leaf, Type: Constant, Value: 1.0 feature_preprocessor:random_trees_embedding:n_estimators, Type: UniformInteger, Range: [10, 100], Default: 10 regressor:__choice__, Type: Categorical, Choices: {decision_tree, extra_trees, gaussian_process, k_nearest_neighbors, random_forest}, Default: random_forest regressor:decision_tree:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse regressor:decision_tree:max_depth_factor, Type: UniformFloat, Range: [0.0, 2.0], Default: 0.5 regressor:decision_tree:max_features, Type: Constant, Value: 1.0 regressor:decision_tree:max_leaf_nodes, Type: Constant, Value: None regressor:decision_tree:min_impurity_decrease, Type: Constant, Value: 0.0 regressor:decision_tree:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1 regressor:decision_tree:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2 regressor:decision_tree:min_weight_fraction_leaf, Type: Constant, Value: 0.0 regressor:extra_trees:bootstrap, Type: Categorical, Choices: {True, False}, Default: False regressor:extra_trees:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse regressor:extra_trees:max_depth, Type: Constant, Value: None regressor:extra_trees:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0 regressor:extra_trees:max_leaf_nodes, Type: Constant, Value: None regressor:extra_trees:min_impurity_decrease, Type: Constant, Value: 0.0 regressor:extra_trees:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1 regressor:extra_trees:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2 regressor:extra_trees:min_weight_fraction_leaf, Type: Constant, Value: 0.0 regressor:gaussian_process:alpha, Type: UniformFloat, Range: [1e-14, 1.0], Default: 1e-08, on log-scale regressor:gaussian_process:thetaL, Type: UniformFloat, Range: [1e-10, 0.001], Default: 1e-06, on log-scale regressor:gaussian_process:thetaU, Type: UniformFloat, Range: [1.0, 100000.0], Default: 100000.0, on log-scale regressor:k_nearest_neighbors:n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 1, on log-scale regressor:k_nearest_neighbors:p, Type: Categorical, Choices: {1, 2}, Default: 2 regressor:k_nearest_neighbors:weights, Type: Categorical, Choices: {uniform, distance}, Default: uniform regressor:random_forest:bootstrap, Type: Categorical, Choices: {True, False}, Default: True regressor:random_forest:criterion, Type: Categorical, Choices: {mse, friedman_mse, mae}, Default: mse regressor:random_forest:max_depth, Type: Constant, Value: None regressor:random_forest:max_features, Type: UniformFloat, Range: [0.1, 1.0], Default: 1.0 regressor:random_forest:max_leaf_nodes, Type: Constant, Value: None regressor:random_forest:min_impurity_decrease, Type: Constant, Value: 0.0 regressor:random_forest:min_samples_leaf, Type: UniformInteger, Range: [1, 20], Default: 1 regressor:random_forest:min_samples_split, Type: UniformInteger, Range: [2, 20], Default: 2 regressor:random_forest:min_weight_fraction_leaf, Type: Constant, Value: 0.0 Conditions: data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__ | data_preprocessor:__choice__ == 'feature_type' data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__ | data_preprocessor:__choice__ == 'feature_type' data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction | data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__ == 'minority_coalescer' data_preprocessor:feature_type:numerical_transformer:imputation:strategy | data_preprocessor:__choice__ == 'feature_type' data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ | data_preprocessor:__choice__ == 'feature_type' data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:n_quantiles | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'quantile_transformer' data_preprocessor:feature_type:numerical_transformer:rescaling:quantile_transformer:output_distribution | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'quantile_transformer' data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_max | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'robust_scaler' data_preprocessor:feature_type:numerical_transformer:rescaling:robust_scaler:q_min | data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__ == 'robust_scaler' feature_preprocessor:extra_trees_preproc_for_regression:bootstrap | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:criterion | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:max_depth | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:max_features | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:max_leaf_nodes | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:min_samples_leaf | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:min_samples_split | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:min_weight_fraction_leaf | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:extra_trees_preproc_for_regression:n_estimators | feature_preprocessor:__choice__ == 'extra_trees_preproc_for_regression' feature_preprocessor:fast_ica:algorithm | feature_preprocessor:__choice__ == 'fast_ica' feature_preprocessor:fast_ica:fun | feature_preprocessor:__choice__ == 'fast_ica' feature_preprocessor:fast_ica:n_components | feature_preprocessor:fast_ica:whiten == 'True' feature_preprocessor:fast_ica:whiten | feature_preprocessor:__choice__ == 'fast_ica' feature_preprocessor:feature_agglomeration:affinity | feature_preprocessor:__choice__ == 'feature_agglomeration' feature_preprocessor:feature_agglomeration:linkage | feature_preprocessor:__choice__ == 'feature_agglomeration' feature_preprocessor:feature_agglomeration:n_clusters | feature_preprocessor:__choice__ == 'feature_agglomeration' feature_preprocessor:feature_agglomeration:pooling_func | feature_preprocessor:__choice__ == 'feature_agglomeration' feature_preprocessor:kernel_pca:coef0 | feature_preprocessor:kernel_pca:kernel in {'poly', 'sigmoid'} feature_preprocessor:kernel_pca:degree | feature_preprocessor:kernel_pca:kernel == 'poly' feature_preprocessor:kernel_pca:gamma | feature_preprocessor:kernel_pca:kernel in {'poly', 'rbf'} feature_preprocessor:kernel_pca:kernel | feature_preprocessor:__choice__ == 'kernel_pca' feature_preprocessor:kernel_pca:n_components | feature_preprocessor:__choice__ == 'kernel_pca' feature_preprocessor:kitchen_sinks:gamma | feature_preprocessor:__choice__ == 'kitchen_sinks' feature_preprocessor:kitchen_sinks:n_components | feature_preprocessor:__choice__ == 'kitchen_sinks' feature_preprocessor:nystroem_sampler:coef0 | feature_preprocessor:nystroem_sampler:kernel in {'poly', 'sigmoid'} feature_preprocessor:nystroem_sampler:degree | feature_preprocessor:nystroem_sampler:kernel == 'poly' feature_preprocessor:nystroem_sampler:gamma | feature_preprocessor:nystroem_sampler:kernel in {'poly', 'rbf', 'sigmoid'} feature_preprocessor:nystroem_sampler:kernel | feature_preprocessor:__choice__ == 'nystroem_sampler' feature_preprocessor:nystroem_sampler:n_components | feature_preprocessor:__choice__ == 'nystroem_sampler' feature_preprocessor:pca:keep_variance | feature_preprocessor:__choice__ == 'pca' feature_preprocessor:pca:whiten | feature_preprocessor:__choice__ == 'pca' feature_preprocessor:polynomial:degree | feature_preprocessor:__choice__ == 'polynomial' feature_preprocessor:polynomial:include_bias | feature_preprocessor:__choice__ == 'polynomial' feature_preprocessor:polynomial:interaction_only | feature_preprocessor:__choice__ == 'polynomial' feature_preprocessor:random_trees_embedding:bootstrap | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:max_depth | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:max_leaf_nodes | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:min_samples_leaf | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:min_samples_split | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:min_weight_fraction_leaf | feature_preprocessor:__choice__ == 'random_trees_embedding' feature_preprocessor:random_trees_embedding:n_estimators | feature_preprocessor:__choice__ == 'random_trees_embedding' regressor:decision_tree:criterion | regressor:__choice__ == 'decision_tree' regressor:decision_tree:max_depth_factor | regressor:__choice__ == 'decision_tree' regressor:decision_tree:max_features | regressor:__choice__ == 'decision_tree' regressor:decision_tree:max_leaf_nodes | regressor:__choice__ == 'decision_tree' regressor:decision_tree:min_impurity_decrease | regressor:__choice__ == 'decision_tree' regressor:decision_tree:min_samples_leaf | regressor:__choice__ == 'decision_tree' regressor:decision_tree:min_samples_split | regressor:__choice__ == 'decision_tree' regressor:decision_tree:min_weight_fraction_leaf | regressor:__choice__ == 'decision_tree' regressor:extra_trees:bootstrap | regressor:__choice__ == 'extra_trees' regressor:extra_trees:criterion | regressor:__choice__ == 'extra_trees' regressor:extra_trees:max_depth | regressor:__choice__ == 'extra_trees' regressor:extra_trees:max_features | regressor:__choice__ == 'extra_trees' regressor:extra_trees:max_leaf_nodes | regressor:__choice__ == 'extra_trees' regressor:extra_trees:min_impurity_decrease | regressor:__choice__ == 'extra_trees' regressor:extra_trees:min_samples_leaf | regressor:__choice__ == 'extra_trees' regressor:extra_trees:min_samples_split | regressor:__choice__ == 'extra_trees' regressor:extra_trees:min_weight_fraction_leaf | regressor:__choice__ == 'extra_trees' regressor:gaussian_process:alpha | regressor:__choice__ == 'gaussian_process' regressor:gaussian_process:thetaL | regressor:__choice__ == 'gaussian_process' regressor:gaussian_process:thetaU | regressor:__choice__ == 'gaussian_process' regressor:k_nearest_neighbors:n_neighbors | regressor:__choice__ == 'k_nearest_neighbors' regressor:k_nearest_neighbors:p | regressor:__choice__ == 'k_nearest_neighbors' regressor:k_nearest_neighbors:weights | regressor:__choice__ == 'k_nearest_neighbors' regressor:random_forest:bootstrap | regressor:__choice__ == 'random_forest' regressor:random_forest:criterion | regressor:__choice__ == 'random_forest' regressor:random_forest:max_depth | regressor:__choice__ == 'random_forest' regressor:random_forest:max_features | regressor:__choice__ == 'random_forest' regressor:random_forest:max_leaf_nodes | regressor:__choice__ == 'random_forest' regressor:random_forest:min_impurity_decrease | regressor:__choice__ == 'random_forest' regressor:random_forest:min_samples_leaf | regressor:__choice__ == 'random_forest' regressor:random_forest:min_samples_split | regressor:__choice__ == 'random_forest' regressor:random_forest:min_weight_fraction_leaf | regressor:__choice__ == 'random_forest' Forbidden Clauses: (Forbidden: feature_preprocessor:feature_agglomeration:affinity in {'cosine', 'manhattan'} && Forbidden: feature_preprocessor:feature_agglomeration:linkage == 'ward') (Forbidden: feature_preprocessor:__choice__ == 'random_trees_embedding' && Forbidden: regressor:__choice__ == 'gaussian_process') (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks') (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca') (Forbidden: regressor:__choice__ == 'decision_tree' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler') (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks') (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca') (Forbidden: regressor:__choice__ == 'extra_trees' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler') (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks') (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca') (Forbidden: regressor:__choice__ == 'gaussian_process' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler') (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks') (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca') (Forbidden: regressor:__choice__ == 'k_nearest_neighbors' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler') (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'kitchen_sinks') (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'kernel_pca') (Forbidden: regressor:__choice__ == 'random_forest' && Forbidden: feature_preprocessor:__choice__ == 'nystroem_sampler')
# Plot the predictions... plt.scatter(train_predictions, y_train, label="Train samples", c='#d95f02') plt.scatter(test_predictions, y_test, label="Test samples", c='#7570b3') plt.xlabel("Predicted value") plt.ylabel("True value") plt.legend() # 偏离直线说明预测得不好,反之接近直线说明预测得好 plt.plot([-500, 500], [-500, 500], c='k', zorder=0) # plt.xlim([30, 400]) # plt.ylim([30, 400]) plt.tight_layout() plt.show()
可以看见最后的拟合结果,接近100%的拟合准确率,所以基本所以点都在直线上
参考资料:
更多的aotosklearn使用例子:https://automl.github.io/auto-sklearn/master/examples/index.html
更多的autosklearn的API使用文档:https://automl.github.io/auto-sklearn/master/api.html
更多关于多分类与多输出回归算法介绍:https://scikit-learn.org/stable/modules/multiclass.html