目录
特征工程中的特征拼接处理(常用于横向拼接自变量特征和因变量特征)
特征工程中的特征拼接处理(常用于横向拼接自变量特征和因变量特征)
输出结果
1. <class 'pandas.core.frame.DataFrame'> 2. RangeIndex: 768 entries, 0 to 767 3. Data columns (total 9 columns): 4. # Column Non-Null Count Dtype 5. --- ------ -------------- ----- 6. 0 Pregnancies 768 non-null int64 7. 1 Glucose 768 non-null int64 8. 2 BloodPressure 768 non-null int64 9. 3 SkinThickness 768 non-null int64 10. 4 Insulin 768 non-null int64 11. 5 BMI 768 non-null float64 12. 6 DiabetesPedigreeFunction 768 non-null float64 13. 7 Age 768 non-null int64 14. 8 Outcome 768 non-null int64 15. dtypes: float64(2), int64(7) 16. memory usage: 54.1 KB 17. None 18. Pregnancies Glucose BloodPressure SkinThickness BMI Outcome 19. 0 6 148 72 35 33.6 1 20. 1 1 85 66 29 26.6 0 21. 2 8 183 64 0 23.3 1 22. 3 1 89 66 23 28.1 0 23. 4 0 137 40 35 43.1 1
实现代码
1. # ML之DS:特征工程中的特征拼接处理(常用于横向拼接自变量特征和因变量特征) 2. import pandas as pd 3. 4. data_frame=pd.read_csv('data_csv_xls\diabetes\diabetes.csv') 5. print(data_frame.info()) 6. 7. col_label='Outcome' 8. cols_other=['Pregnancies','Glucose','BloodPressure','SkinThickness','BMI'] 9. data_X=data_frame[cols_other] 10. data_y_label_μ=data_frame[col_label] 11. data_dall = pd.concat([data_X, data_y_label_μ], axis=1) 12. print(data_dall.head())