目录
利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)
相关文章
ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)
ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)实现
利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)
1、数据集基本信息
1. (3000, 13) 13 3000 2. 3. total_price object 4. unit_price object 5. roomtype object 6. height object 7. direction object 8. decorate object 9. area object 10. age float64 11. garden object 12. district object 13. total_price_Num float64 14. unit_price_Num int64 15. area_Num float64 16. dtype: object 17. 18. Index(['total_price', 'unit_price', 'roomtype', 'height', 'direction', 19. 'decorate', 'area', 'age', 'garden', 'district', 'total_price_Num', 20. 'unit_price_Num', 'area_Num'], 21. dtype='object') 22. 23. total_price unit_price roomtype ... total_price_Num unit_price_Num area_Num 24. 0 290万 46186元/平米 2室1厅 ... 290.0 46186 62.79 25. 1 599万 76924元/平米 2室1厅 ... 599.0 76924 77.87 26. 2 420万 51458元/平米 2室1厅 ... 420.0 51458 81.62 27. 3 269.9万 34831元/平米 2室2厅 ... 269.9 34831 77.49 28. 4 383万 79051元/平米 1室1厅 ... 383.0 79051 48.45 29. 30. [5 rows x 13 columns] 31. 32. total_price unit_price roomtype ... total_price_Num unit_price_Num area_Num 33. 2995 230万 43144元/平米 1室1厅 ... 230.0 43144 53.31 34. 2996 372万 75016元/平米 1室1厅 ... 372.0 75016 49.59 35. 2997 366万 49973元/平米 2室1厅 ... 366.0 49973 73.24 36. 2998 365万 69103元/平米 2室1厅 ... 365.0 69103 52.82 37. 2999 420万 49412元/平米 2室2厅 ... 420.0 49412 85.00 38. 39. [5 rows x 13 columns] 40. <class 'pandas.core.frame.DataFrame'> 41. RangeIndex: 3000 entries, 0 to 2999 42. Data columns (total 13 columns): 43. # Column Non-Null Count Dtype 44. --- ------ -------------- ----- 45. 0 total_price 3000 non-null object 46. 1 unit_price 3000 non-null object 47. 2 roomtype 3000 non-null object 48. 3 height 3000 non-null object 49. 4 direction 3000 non-null object 50. 5 decorate 3000 non-null object 51. 6 area 3000 non-null object 52. 7 age 2888 non-null float64 53. 8 garden 3000 non-null object 54. 9 district 3000 non-null object 55. 10 total_price_Num 3000 non-null float64 56. 11 unit_price_Num 3000 non-null int64 57. 12 area_Num 3000 non-null float64 58. dtypes: float64(3), int64(1), object(9) 59. memory usage: 304.8+ KB 60. 61. age total_price_Num unit_price_Num area_Num 62. count 2888.000000 3000.000000 3000.000000 3000.000000 63. mean 2001.453601 631.953450 58939.028333 102.180667 64. std 9.112425 631.308855 25867.208297 62.211662 65. min 1911.000000 90.000000 11443.000000 17.050000 66. 25% 1996.000000 300.000000 40267.500000 67.285000 67. 50% 2003.000000 437.000000 54946.000000 89.230000 68. 75% 2008.000000 738.000000 73681.250000 119.035000 69. max 2018.000000 9800.000000 250813.000000 801.140000
2、模型结果输出
1. AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()), 2. ('lasso', 3. Lasso(alpha=0.001, random_state=1))]), 4. Pipeline(steps=[('robustscaler', RobustScaler()), 5. ('elasticnet', 6. ElasticNet(alpha=0.001, l1_ratio=0.9, 7. random_state=3))]), 8. GradientBoostingRegressor(random_state=5))) 9. R2_res [0.9944881811696309, 0.000626615309319283, array([0.99470591, 0.99512495, 0.99435729, 0.99491104, 0.99334171])] 10. MAE_res [-0.004994183753322101, 0.0001083601234287803, array([-0.00493338, -0.005202 , -0.00489054, -0.00498097, -0.00496404])] 11. RMSE_res [-8.323227156546791e-05, 9.870911328329942e-06, array([-8.14778066e-05, -7.79621763e-05, -7.93078692e-05, -7.49049128e-05, 12. -1.02508593e-04])] 13. AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()), 14. ('lasso', 15. Lasso(alpha=0.001, random_state=1))]), 16. Pipeline(steps=[('robustscaler', RobustScaler()), 17. ('elasticnet', 18. ElasticNet(alpha=0.001, l1_ratio=0.9, 19. random_state=3))]), 20. GradientBoostingRegressor(random_state=5))) 21. Avg_Best_models Score value: 0.9947618159336031 22. Avg_Best_models R2 value: 0.9947618159336031 23. Avg_Best_models MAE value: 0.0064209273962331555 24. Avg_Best_models MSE value: 9.023779248949011e-05 25. 26. Avg_Best_models模型花费时间: 0:06:14.344069