我有大量的数据集的特点。 我过滤掉了过滤器,并将所选特性的名称存储在4个数组中。 我想删除那些没有被选中的功能
df = pd.read_excel("Anonymizeddataset.xlsx")
df = df.fillna(0)
# 4 arrays
features_selected_with_nan_value
KBest_select_feature
features_selected_with_mean_value
laso_selected_features
def drop_features(features):
for index, row in df.iterrows():
for i in range(len(features)):
if row != features[i]:
df_with_selected_features = df.drop([row], axis = 1, inplace = True)
return df_with_selected_features
但它抛出这个错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
数据集
Target Predictor 1 Predictor 2 Predictor 3 Predictor 4 Predictor 5 Predictor 6 Predictor 7 Predictor 8 Predictor 9 ... Predictor 1065 Predictor 1066 Predictor 1067 Predictor 1068 Predictor 1069 Predictor 1070 Predictor 1071 Predictor 1072 Predictor 1073 Predictor 1074
0 5704.7 98.013498 98.380881 66.012913 21.447560 0.0 0.0 0.0 57.549196 12 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 3200.0 51.224883 98.380881 70.885204 21.447560 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 6487.9 44.563802 98.380881 85.757141 21.447560 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1278.3 65.039616 98.380881 18.380713 87.745614 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 1368.5 1.905928 98.380881 96.797313 87.745614 0.0 0.0 0.0 57.549196 13 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 rows × 1075 columns
features_selected_with_nan_value数组
['Predictor 387', 'Predictor 381', 'Predictor 383', 'Predictor 376', 'Predictor 28', 'Predictor 35', 'Predictor 4', 'Predictor 37', 'Predictor 34', 'Predictor 19', 'Predictor 16', 'Predictor 17', 'Predictor 25', 'Predictor 880', 'Predictor 856', 'Predictor 849', 'Predictor 851', 'Predictor 852', 'Predictor 857', 'Predictor 853', 'Predictor 855', 'Predictor 850', 'Predictor 854', 'Predictor 40', 'Predictor 881', 'Predictor 882', 'Predictor 883', 'Predictor 884', 'Predictor 1015', 'Predictor 487', 'Predictor 738', 'Predictor 476', 'Predictor 473', 'Predictor 749', 'Predictor 604', 'Predictor 607', 'Predictor 618', 'Predictor 848', 'Predictor 1014', 'Predictor 1007', 'Predictor 1012', 'Predictor 979', 'Predictor 344', 'Predictor 345', 'Predictor 356', 'Predictor 392', 'Predictor 858', 'Predictor 859', 'Predictor 860', 'Predictor 861', 'Predictor 879', 'Predictor 862', 'Predictor 863', 'Predictor 980', 'Predictor 864', 'Predictor 878', 'Predictor 865', 'Predictor 877', 'Predictor 866', 'Predictor 867', 'Predictor 869', 'Predictor 870', 'Predictor 871', 'Predictor 872', 'Predictor 873', 'Predictor 874', 'Predictor 876', 'Predictor 735', 'Predictor 981', 'Predictor 982', 'Predictor 983', 'Predictor 1011', 'Predictor 1010', 'Predictor 1009', 'Predictor 1008', 'Predictor 875', 'Predictor 1006', 'Predictor 1005', 'Predictor 1004', 'Predictor 1003', 'Predictor 1002', 'Predictor 1001', 'Predictor 1000', 'Predictor 342', 'Predictor 998', 'Predictor 997', 'Predictor 996', 'Predictor 995', 'Predictor 994', 'Predictor 992', 'Predictor 991', 'Predictor 990', 'Predictor 989', 'Predictor 988', 'Predictor 987', 'Predictor 986', 'Predictor 985', 'Predictor 984', 'Predictor 1013', 'Predictor 993']
我做错了什么? 问题来源StackOverflow 地址:/questions/59380060/drop-set-of-features-from-df-throw-an-error-of-the-truth-value-of-a-series-is-am
如果我没有理解错你的问题,你可以这样做:
df = pd.read_excel("Anonymizeddataset.xlsx")
df = df.fillna(0)
list_columns = features_selected_with_nan_value + KBest_select_feature +
features_selected_with_mean_value + laso_selected_features
df = df[list_columns]
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。