利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)
1、数据集基本信息
(3000, 13) 13 3000
total_price object
unit_price object
roomtype object
height object
direction object
decorate object
area object
age float64
garden object
district object
total_price_Num float64
unit_price_Num int64
area_Num float64
dtype: object
Index(['total_price', 'unit_price', 'roomtype', 'height', 'direction',
'decorate', 'area', 'age', 'garden', 'district', 'total_price_Num',
'unit_price_Num', 'area_Num'],
dtype='object')
total_price unit_price roomtype ... total_price_Num unit_price_Num area_Num
0 290万 46186元/平米 2室1厅 ... 290.0 46186 62.79
1 599万 76924元/平米 2室1厅 ... 599.0 76924 77.87
2 420万 51458元/平米 2室1厅 ... 420.0 51458 81.62
3 269.9万 34831元/平米 2室2厅 ... 269.9 34831 77.49
4 383万 79051元/平米 1室1厅 ... 383.0 79051 48.45
[5 rows x 13 columns]
total_price unit_price roomtype ... total_price_Num unit_price_Num area_Num
2995 230万 43144元/平米 1室1厅 ... 230.0 43144 53.31
2996 372万 75016元/平米 1室1厅 ... 372.0 75016 49.59
2997 366万 49973元/平米 2室1厅 ... 366.0 49973 73.24
2998 365万 69103元/平米 2室1厅 ... 365.0 69103 52.82
2999 420万 49412元/平米 2室2厅 ... 420.0 49412 85.00
[5 rows x 13 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_price 3000 non-null object
1 unit_price 3000 non-null object
2 roomtype 3000 non-null object
3 height 3000 non-null object
4 direction 3000 non-null object
5 decorate 3000 non-null object
6 area 3000 non-null object
7 age 2888 non-null float64
8 garden 3000 non-null object
9 district 3000 non-null object
10 total_price_Num 3000 non-null float64
11 unit_price_Num 3000 non-null int64
12 area_Num 3000 non-null float64
dtypes: float64(3), int64(1), object(9)
memory usage: 304.8+ KB
age total_price_Num unit_price_Num area_Num
count 2888.000000 3000.000000 3000.000000 3000.000000
mean 2001.453601 631.953450 58939.028333 102.180667
std 9.112425 631.308855 25867.208297 62.211662
min 1911.000000 90.000000 11443.000000 17.050000
25% 1996.000000 300.000000 40267.500000 67.285000
50% 2003.000000 437.000000 54946.000000 89.230000
75% 2008.000000 738.000000 73681.250000 119.035000
max 2018.000000 9800.000000 250813.000000 801.140000
2、模型结果输出
AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()),
('lasso',
Lasso(alpha=0.001, random_state=1))]),
Pipeline(steps=[('robustscaler', RobustScaler()),
('elasticnet',
ElasticNet(alpha=0.001, l1_ratio=0.9,
random_state=3))]),
GradientBoostingRegressor(random_state=5)))
R2_res [0.9944881811696309, 0.000626615309319283, array([0.99470591, 0.99512495, 0.99435729, 0.99491104, 0.99334171])]
MAE_res [-0.004994183753322101, 0.0001083601234287803, array([-0.00493338, -0.005202 , -0.00489054, -0.00498097, -0.00496404])]
RMSE_res [-8.323227156546791e-05, 9.870911328329942e-06, array([-8.14778066e-05, -7.79621763e-05, -7.93078692e-05, -7.49049128e-05,
-1.02508593e-04])]
AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()),
('lasso',
Lasso(alpha=0.001, random_state=1))]),
Pipeline(steps=[('robustscaler', RobustScaler()),
('elasticnet',
ElasticNet(alpha=0.001, l1_ratio=0.9,
random_state=3))]),
GradientBoostingRegressor(random_state=5)))
Avg_Best_models Score value: 0.9947618159336031
Avg_Best_models R2 value: 0.9947618159336031
Avg_Best_models MAE value: 0.0064209273962331555
Avg_Best_models MSE value: 9.023779248949011e-05
Avg_Best_models模型花费时间: 0:06:14.344069