直接使用
请打开数据分析经典案例:Kaggle竞赛之房价预测,并点击右上角 “ 在DSW中打开” 。
使用机器学习回归模型预测房价
本文展示如何利用一个包含数值类型,和非数值类型的数据集来做数据加载、数据探索、数据处理、特征工程,以及最终实果比较好的回归模型。其中涉及到的数据和案例是这个Kaggle竞赛。
同时,DSW中还有另外一个Sample Notebook也使用了这个数据集进行房价的回归分析,有兴趣的同学可以看一下这两种不同的算法在同一个数据集上面的表现。 链接
最后的结果是XGBoost算法实现了更好的精度(92% vs 86%)
准备工作
本文依赖的软件包都已经在DSW镜像中预置安装,如果您的环境没有安装的话,可以用pip install xxx来完成准备。
我们先把需要的python库import进来。
import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import math import datetime from scipy import stats from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.preprocessing import StandardScaler,OrdinalEncoder,LabelEncoder from sklearn.feature_selection import RFE from sklearn.linear_model import LinearRegression, Ridge, Lasso from sklearn.metrics import r2_score,mean_squared_error import statsmodels.api as sm from statsmodels.stats.outliers_influence import variance_inflation_factor sns.set_style('darkgrid') # 禁掉Warning的输出 import warnings warnings.filterwarnings('ignore')
数据加载
使用Pandas读入数据,并查看原始数据。train.csv文件是我们已经提前从网上下载并准备好。本文没有涉及测试样本,可以在网上下载对应的test.csv文件。
df = pd.read_csv('train.csv')
df.head()
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | ... | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice | |
0 | 1 | 60 | RL | 65.0 | 8450 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2008 | WD | Normal | 208500 |
1 | 2 | 20 | RL | 80.0 | 9600 | Pave | NaN | Reg | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 5 | 2007 | WD | Normal | 181500 |
2 | 3 | 60 | RL | 68.0 | 11250 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 9 | 2008 | WD | Normal | 223500 |
3 | 4 | 70 | RL | 60.0 | 9550 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 2 | 2006 | WD | Abnorml | 140000 |
4 | 5 | 60 | RL | 84.0 | 14260 | Pave | NaN | IR1 | Lvl | AllPub | ... | 0 | NaN | NaN | NaN | 0 | 12 | 2008 | WD | Normal | 250000 |
5 rows × 81 columns |
df.shape
(1460, 81)
df.info()
Click to hide<class 'pandas.core.frame.DataFrame'> RangeIndex: 1460 entries, 0 to 1459 Data columns (total 81 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Id 1460 non-null int64 1 MSSubClass 1460 non-null int64 2 MSZoning 1460 non-null object 3 LotFrontage 1201 non-null float64 4 LotArea 1460 non-null int64 5 Street 1460 non-null object 6 Alley 91 non-null object 7 LotShape 1460 non-null object 8 LandContour 1460 non-null object 9 Utilities 1460 non-null object 10 LotConfig 1460 non-null object 11 LandSlope 1460 non-null object 12 Neighborhood 1460 non-null object 13 Condition1 1460 non-null object 14 Condition2 1460 non-null object 15 BldgType 1460 non-null object 16 HouseStyle 1460 non-null object 17 OverallQual 1460 non-null int64 18 OverallCond 1460 non-null int64 19 YearBuilt 1460 non-null int64 20 YearRemodAdd 1460 non-null int64 21 RoofStyle 1460 non-null object 22 RoofMatl 1460 non-null object 23 Exterior1st 1460 non-null object 24 Exterior2nd 1460 non-null object 25 MasVnrType 1452 non-null object 26 MasVnrArea 1452 non-null float64 27 ExterQual 1460 non-null object 28 ExterCond 1460 non-null object 29 Foundation 1460 non-null object 30 BsmtQual 1423 non-null object 31 BsmtCond 1423 non-null object 32 BsmtExposure 1422 non-null object 33 BsmtFinType1 1423 non-null object 34 BsmtFinSF1 1460 non-null int64 35 BsmtFinType2 1422 non-null object 36 BsmtFinSF2 1460 non-null int64 37 BsmtUnfSF 1460 non-null int64 38 TotalBsmtSF 1460 non-null int64 39 Heating 1460 non-null object 40 HeatingQC 1460 non-null object 41 CentralAir 1460 non-null object 42 Electrical 1459 non-null object 43 1stFlrSF 1460 non-null int64 44 2ndFlrSF 1460 non-null int64 45 LowQualFinSF 1460 non-null int64 46 GrLivArea 1460 non-null int64 47 BsmtFullBath 1460 non-null int64 48 BsmtHalfBath 1460 non-null int64 49 FullBath 1460 non-null int64 50 HalfBath 1460 non-null int64 51 BedroomAbvGr 1460 non-null int64 52 KitchenAbvGr 1460 non-null int64 53 KitchenQual 1460 non-null object 54 TotRmsAbvGrd 1460 non-null int64 55 Functional 1460 non-null object 56 Fireplaces 1460 non-null int64 57 FireplaceQu 770 non-null object 58 GarageType 1379 non-null object 59 GarageYrBlt 1379 non-null float64 60 GarageFinish 1379 non-null object 61 GarageCars 1460 non-null int64 62 GarageArea 1460 non-null int64 63 GarageQual 1379 non-null object 64 GarageCond 1379 non-null object 65 PavedDrive 1460 non-null object 66 WoodDeckSF 1460 non-null int64 67 OpenPorchSF 1460 non-null int64 68 EnclosedPorch 1460 non-null int64 69 3SsnPorch 1460 non-null int64 70 ScreenPorch 1460 non-null int64 71 PoolArea 1460 non-null int64 72 PoolQC 7 non-null object 73 Fence 281 non-null object 74 MiscFeature 54 non-null object 75 MiscVal 1460 non-null int64 76 MoSold 1460 non-null int64 77 YrSold 1460 non-null int64 78 SaleType 1460 non-null object 79 SaleCondition 1460 non-null object 80 SalePrice 1460 non-null int64 dtypes: float64(3), int64(35), object(43) memory usage: 924.0+ KB
数据清洗与预处理
一般我们拿到的原始数据都有各种各样的问题,不利于分析和训练,所以要经过一个清洗和预处理的阶段,比如去重,缺失值,异常值等等的处理。
我们在前面已经看到原始数据有81列特征,总计1460条记录。其中ID列对我们做训练没有意义,先去掉:
df.drop('Id',axis=1,inplace=True)
处理空值(缺失值)
csv文件中有些列没有值,我们需要统一处理。
# 定义一个函数,求每一个dataframe中的空值的比例 def check_null_percentage(df): missing_info = pd.DataFrame(np.array(df.isnull().sum().sort_values(ascending=False).reset_index()), columns=['Columns','Missing_Percentage']).query("Missing_Percentage > 0").set_index('Columns') return 100*missing_info/df.shape[0]
check_null_percentage(df)
Missing_Percentage |
|
Columns | |
PoolQC | 99.5205 |
MiscFeature | 96.3014 |
Alley | 93.7671 |
Fence | 80.7534 |
FireplaceQu | 47.2603 |
LotFrontage | 17.7397 |
GarageType | 5.54795 |
GarageCond | 5.54795 |
GarageFinish | 5.54795 |
GarageQual | 5.54795 |
GarageYrBlt | 5.54795 |
BsmtFinType2 | 2.60274 |
BsmtExposure | 2.60274 |
BsmtQual | 2.53425 |
BsmtCond | 2.53425 |
BsmtFinType1 | 2.53425 |
MasVnrArea | 0.547945 |
MasVnrType | 0.547945 |
Electrical | 0.068493 |
从上面的结果可以得到有缺失值存在的列的列名,我们把其中缺失值比较严重的非数值型的列选出来,并且填充为'NA',使得后续的一些计算和处理不会报错。
从具体含义上面来讲,这里其实表示某些房子没有某一些属性,比如没有车库等等。
下面的NA_columns里边的列都是非数值类型的feature,其他的数值类型的feature后面会专门处理
NA_columns = ['Alley','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','FireplaceQu',\ 'GarageType','GarageFinish','GarageQual','GarageCond','PoolQC','Fence','MiscFeature'] df[NA_columns] = df[NA_columns].fillna('NA')
确保没有任何一条记录超过5个缺失值:
df[df.isnull().sum(axis=1) > 5]
MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | ... | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice |
0 rows × 80 columns |
检查重复样本
经过上面的处理,检查一下现在数据集中是不是存在完全相同的行
df.duplicated(keep=False).sum()
0
这里检查一下每一个feature中,值的分布情况,按照”纯度“来排名,目的是列出那些只有某一个值的feature
# 检查每一个feature的值的唯一性 def top_unique_count(x): unq_cnt = ( x.value_counts(ascending=False,dropna=False).head(1).index.values[0], 100 * x.value_counts(ascending=False,dropna=False).head(1).values[0]/df.shape[0], x.value_counts(ascending=False,dropna=False).head(1).values[0]) return unq_cnt
处理数值型特征的缺失值
前面我们把用check_null_percentage这个函数找出来的包含缺失值的非数值类型的列都用‘NA’填充,这里列出数值类型的feature并且包含缺失值的feature。
然后使用”均值“填充缺失值的位置
check_null_percentage(df)
LotFrontage这一列有缺失值,需要处理一下
df['LotFrontage'] = df.groupby(['Neighborhood','LotConfig'])['LotFrontage'].\ apply(lambda x: np.Nan if x.median() == np.NaN else x.fillna(x.median())) df['LotFrontage'].isnull().sum()
5
这里使用均值来填
df['LotFrontage'] = df.groupby(['LotConfig'])['LotFrontage'].apply(lambda x: x.fillna(x.median()))
下面看一下车库的年限的缺失值和其他一些和车库念想有关系的指标的缺失值情况
可以发现,车库年限的缺失值位置,其他的这些车库的相关的feature也都是缺失的。这个也很好理解,对于没有车库的房子,所有和车库相关的属性确实都不存在
df.loc[df.GarageYrBlt.isnull(),['GarageType','GarageCars','GarageArea','GarageFinish','GarageYrBlt','GarageQual','GarageCond']]
检查一下房子年限的空值有多少
df.YearBuilt.isnull().sum()
0
这里使用房子的年限值来填充缺失的车库年限的值
df.loc[df.GarageYrBlt.isnull(),'GarageYrBlt'] = df.loc[df.GarageYrBlt.isnull(),'YearBuilt']
下面两个值,一个是数值类型,一个是类别类型。分别用NA和0填充
df.MasVnrArea.fillna(0,inplace=True) df.MasVnrType.fillna('Not present',inplace=True)
特征工程
创建新的feature
机器学习特征工程离不开相关的Domain Knowledge。对于本文而言,房子的数据集中没有包含房子的总层数,房子总共有多少个洗手间和房子门厅的大小。
但是根据相关的Domain Knowledge,这些属性对于买家来说都是很看重的因素,所以这里基于现有的其他feature,将这几个属性创建出来
# 房子总面积 df['TotalFlrSFAbvGrd'] = df[['1stFlrSF','2ndFlrSF']].sum(axis=1) # 一共有多少个洗漱间 df['TotalBath'] = df[['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath']].sum(axis=1) # 门廊的面积 df['TotalPorchSF'] = df[['OpenPorchSF','EnclosedPorch','3SsnPorch','ScreenPorch','WoodDeckSF']].sum(axis=1)
由于新增加了feature,再一次检查当前数据集中是否有空值的feature
# 再次检查新增加的特征有没有空值 check_null_percentage(df)
调整特征类别
将那些超过9成值都一样的featuredrop掉,因为这些feature的熵很低,对于我们的模型来说,包含的有用信息极少
unique_df = df.apply(top_unique_count).rename(index={0:"Value",1:'Percentage',2:'Count'})\ .T.sort_values(by='Count',ascending=False) unique_df.head(25)
# 看看哪些feature的值基本都一样 drop_columns = unique_df.query('Percentage > 90.0').index.values drop_columns
array(['Utilities', 'Street', 'PoolArea', 'PoolQC', 'Condition2', '3SsnPorch', 'LowQualFinSF', 'RoofMatl', 'Heating', 'MiscVal', 'MiscFeature', 'KitchenAbvGr', 'LandSlope', 'BsmtHalfBath', 'Alley', 'CentralAir', 'Functional', 'ScreenPorch', 'PavedDrive', 'Electrical', 'GarageCond'], dtype=object)
# 将上面拿到的这些feature都去掉,因为他们不包含很多有用的信息 df.drop(columns=drop_columns,inplace=True) del drop_columns
按照当前的数据集中的每一个feature的数据类型,将他们分为下面几类:
- 数值型
- 类别型
- 时间序列型
# 列出数值型feature numerical_features = list(df.select_dtypes(include=[np.number]).columns.values) # 列出类别型的feature categorical_features = list(df.select_dtypes(include=[np.object]).columns.values) # 时间序列的feature timeseries_features = ['YearBuilt', 'YearRemodAdd', 'YrSold', 'MoSold', 'GarageYrBlt']
# 将时间序列的feature从数值型的feature中去掉,分别处理 for col in timeseries_features: numerical_features.remove(col)
# 如果数值型的feature,他的取值少于10个的话,那么将他们归类到类别型的feature中 cat_feature = pd.Series(df[numerical_features].nunique().sort_values(),name='Count').to_frame().query('Count <= 10').index.values categorical_features.extend(cat_feature)
# 将上面归类到类别型的feature从原来的数值型feature列表中去掉 for col in cat_feature: numerical_features.remove(col)
分析数值型的feature
我们看一下数值型特征与房价的关系,特别是线性关系。
fig,ax = plt.subplots(math.ceil(len(numerical_features)/3),3,figsize=(15,30),sharey=True) i ,j = 0, 0 for col in sorted(numerical_features): sns.regplot(col,'SalePrice',data=df,ax=ax[i][j]) if j == 2: j=0 i +=1 else: j +=1 ax[6][1].set_visible(False) ax[6][2].set_visible(False)
数值型feature的分析结论
1stFlrSF : 有一个异常值,这个房子平方英尺大于4000 2ndFlrSF : 可以看出这一列数据和售价没有明显的线性关系 BsmtFinSF1 : 有一个异常数据,面积大于5000,价格反而低,可以认为是脏数据 BsmtFinSF2 : 可以看出这一列数据和售价没有明显的线性关系 BsmtUnfSF : 可以看出这一列数据和售价没有明显的线性关系 EnclosedPorch : 有一个异常数据 GarageArea : 有一些异常数据 GrLivArea : 由一些面积大,但是价格低的房子,可以视为异常数据 LotArea : 大于100000之后,有几个异常数据 LotFrontage : 有几条异常数据 MasVnrArea : 大于1500平方尺之后,有异常数据 MSSubClass : 可以看出这一列数据和售价没有明显的线性关系 OpenPorchSF : 大于400之后,有异常数据 TotalBsmtSF : 有一条异常数据
异常值处理
根据上面的结果,分别给每一个数值类型的feature定义了目前数据集中包含的异常值的数量
feature_outlier_count = {'1stFlrSF':1, 'BsmtFinSF1':1, 'BsmtFinSF2':1, 'EnclosedPorch':2, 'GarageArea':4, 'GrLivArea':4, 'LotArea':4, 'LotFrontage':2, 'MasVnrArea':1, 'OpenPorchSF':3, 'TotalBsmtSF':4, 'TotRmsAbvGrd':1, 'TotalFlrSFAbvGrd':2, 'TotalPorchSF':1, 'WoodDeckSF':3}
下面定义函数来打印这些异常点
def print_outliers(feature_list): for k,v in feature_list.items(): if v: display(df.loc[df[k].isin(sorted(df[k])[-v:]),[k,'SalePrice']]) def get_outliers(feature,index=-1): return df.loc[df[feature] == sorted(df[feature])[index],[feature,'SalePrice']].sort_values(by=feature,ascending=False)
print_outliers(feature_outlier_count)
上面可以看出,1298这一行多次出现,说明这是一个包含多条异常值的数据,我们需要将其取掉
outlier_index = get_outliers('1stFlrSF').index.values[0] outlier_index
1298
df.iloc[1298]
MSSubClass 60 MSZoning RL LotFrontage 313 LotArea 63887 LotShape IR3 ... SaleCondition Partial SalePrice 160000 TotalFlrSFAbvGrd 5642 TotalBath 5 TotalPorchSF 506 Name: 1298, Length: 62, dtype: object
按照上面拿到的异常值的index,将其去掉
def remove_outlier_features_count_for_index(outlier_idx): for col in feature_outlier_count.keys(): if (feature_outlier_count[col] > 0) & (outlier_index in get_outliers(col).index.values): feature_outlier_count[col] = feature_outlier_count[col]-1 df.drop(outlier_index,inplace=True) df.reset_index(drop=True,inplace=True)
remove_outlier_features_count_for_index(outlier_index)
使用均值来替代原来的异常值
df.loc[df.index[get_outliers('TotRmsAbvGrd').index.values[0]],'TotRmsAbvGrd'] = df.loc[df['SalePrice'] == get_outliers('TotRmsAbvGrd').SalePrice.values[0],'TotRmsAbvGrd'].mode()[0] feature_outlier_count['TotRmsAbvGrd'] = 0
def fix_outliers(outlier_features_list): for k,v in outlier_features_list.items(): while v > 0: # replacing the outliers by taking mean of four closest feature value of the outlier at the salePrice Range replace_with = df.loc[(df['SalePrice']-get_outliers(k)['SalePrice'].values[0]).abs().argsort()[v:v+4],k].mean() if (df[k].dtypes == np.int64) | (df[k].dtypes == np.int32): df.loc[df.index[get_outliers(k).index.values[0]],k] = int(replace_with) else: df.loc[df.index[get_outliers(k).index.values[0]],k] = round(replace_with,1) v = v-1 feature_outlier_count[k] = v
fix_outliers(feature_outlier_count)
下面继续检查数据集中的非正常值
df[['1stFlrSF','SalePrice']].sort_values(by='1stFlrSF',ascending=False)[:3]
df.loc[df.index[get_outliers('1stFlrSF',-2).index.values[0]],'1stFlrSF'] = df.loc[(df['SalePrice']-get_outliers('1stFlrSF',-2)['SalePrice'].values[0]).abs().argsort()[1:1+4],'1stFlrSF'].mean()
df[['BsmtFinSF1','SalePrice']].sort_values(by='BsmtFinSF1',ascending=False)[:3]
df[['LotArea','SalePrice']].sort_values(by='LotArea',ascending=False)[:7]
修正明显不正常的值
feature_outlier_count['LotArea']=3 feature_outlier_count['BsmtFinSF1']=1 fix_outliers(feature_outlier_count)
del feature_outlier_count
进一步分析特征
再一次可视化这些数值类型的feature,看看是不是取得了一定的效果
fig,ax = plt.subplots(math.ceil(len(numerical_features)/3),3,figsize=(15,30),sharey=True) i ,j = 0, 0 for col in sorted(numerical_features): sns.regplot(col,'SalePrice',data=df,ax=ax[i][j]) if j == 2: j=0 i +=1 else: j +=1 ax[6][1].set_visible(False) ax[6][2].set_visible(False)
结论:
经过前面的一系列处理,我们发现下列4个feature还需要进一步的处理
BsmtFinSF2 : 这个feature和售价几乎没有线性关联关系,可以去掉 BsmtUnfSF : 这个feature和售价几乎没有线性关联关系,可以去掉 EnclosedPorch : 这个feature和售价几乎没有线性关联关系,可以去掉 MSSubClass : 这个看起来更像一个类别型的feature,应该把它重新放回到类别型的feature中去处理
去掉与房价无关的特征:
df.drop(['BsmtFinSF2','BsmtUnfSF','EnclosedPorch'],axis=1,inplace=True) for col in ['BsmtFinSF2','BsmtUnfSF','EnclosedPorch']: numerical_features.remove(col)
将MSSubClass转化为类别型feature:
df.MSSubClass = df.MSSubClass.astype(str) df.MSSubClass.replace({'20':'1story', '30':'1story', '40':'1story', '45':'1story', '50':'1story', '60':'2story', '70':'2story', '75':'2story', '80':'nstory', '85':'nstory', '90':'nstory', '120':'1story', '150':'1story', '160':'2story','180':'nstory','190':'nstory'}, inplace=True) categorical_features.append('MSSubClass') # removing it from numerical feature list numerical_features.remove('MSSubClass')
处理类别型的特征
这里使用甘特图来展示类别型的feature
fig,ax = plt.subplots(math.ceil(len(categorical_features)/3),3,figsize=(20,60),sharey=True) i ,j = 0, 0 PROPS = { 'boxprops':{'facecolor':'none', 'edgecolor':'black','linewidth':0.3}, } for col in sorted(categorical_features): sns.boxplot(col,'SalePrice',data=df,ax=ax[i][j],showfliers=False,**PROPS) sns.stripplot(col,'SalePrice',data=df,ax=ax[i][j],alpha=0.5) if df[col].nunique() > 8: ax[i][j].tick_params(axis='x',rotation=45) if j == 2: j=0 i +=1 else: j +=1
结论
合并类别:
BedroomAbvGr : 0,5,6,8都是少数类,需要合并 BldgType : 合并 2fmCon ,Twnhs, Duplex BsmtCond :合并 NA, Fa,Poor BsmtExposure :合并 Mn,Av BsmtFinType1 : 合并 ALQ, Rec, BLQ,LwQ BsmtFinType2 : 合并 BLQ , Rec,LwQ BsmtFullBath : 合并 2,3 BsmtQual : 合并 NA,Fa Condition1 : 合并 RRNn,RRAn, PosN 和 PosA , RRNe 和 RRAe and Feedr 和 Artery Exterior2nd : 合并 MetalSd, Wd Shng, HbBoard, Plywood, Wd Sdng , Stucco 和 combine CBlock, Other , Stone, AsphShn, ImStucc, Brk Cmn, BrkFace FireplaceQu : 合并 No Fireplace, Po 和 Fa Foundation: 合并 Wood, Slab 和 Stone FullBath : 合并 0 and 1 GarageType: 合并 Detchd, CarPort, No Garage, Basment 和 2Types. GarageQual : 合并 Ex 和 Gd , Po , Fa 和 No Garage HeatingQC : 合并 Fa 和 Po House Style : 合并 2Story 和 2.5Fin, SFoyer 和 1.5Fin, SLvl 和 1Story, 1.5Unf 和 2.5Unf LotShape: 合并 IR2 和 IR3 MSZoning : 合并 RM 和 RH to other MasVnrType: 合并 None, Not present 和 BrkCmn Neighborhood : 合并 MeadowV , BrDale and IDOTRR , Sawyer , NAmes , NPkVill , Mitchel , SWISU and Blueste , Gilbert , Blmngtn , SawyerW and NWAmes, ClearCr , CollgCr and Crawfor, Veenker, Timber and Somerst , OldTown , Edwards and BrkSide , StoneBr , NridgHt and NoRidge. OverallCond : 1, 2 和 3 , 6, 7, 和 8 OverallQual : 1 和 2 SaleCondition: 合并 AdjLand, Alloca, Family 和 Abnorml SaleType: 合并 COD, ConLD, ConLI, CwD, ConLw, Con 和 Oth.
要取掉的feature:
ExterCond : 值太少,属于缺失值太多的feature,删除 Exterior1st : 看起来和售价没有线性相关性 Fence : 这个feature几乎所有的值都一样,删掉 LotConfig : 这个feature几乎所有的值都一样,删掉 RoofStyle : 只有两类,并且数量一样,属于不包含很多有效信息的feature,删掉
高关联性feature:
Fireplaces, GarageCars, HeatingQC, KitchenQual
基于上面的分析,合并类别标签
df.BldgType.replace({'2fmCon':'Twnhs','Duplex':'Twnhs'},inplace=True) df.BsmtExposure.replace({'Mn':'Av'},inplace=True) df.Condition1.replace({'RRNn' : 'RRAn', 'PosN' : 'PosA' , 'RRNe' : 'RRAe' , 'Feedr' : 'Artery'},inplace=True) df.Exterior2nd.replace({'MetalSd':'Wd Sdng', 'Wd Shng':'Wd Sdng', 'HbBoard':'Wd Sdng','Plywood':'Wd Sdng',\ 'Stucco':'Wd Sdng' , 'CBlock': 'BrkFace','Other': 'BrkFace' , 'Stone': 'BrkFace',\ 'AsphShn': 'BrkFace', 'ImStucc': 'BrkFace', 'Brk Cmn': 'BrkFace'},inplace=True) df.Foundation.replace({'Wood':'Stone','Slab':'Stone'},inplace=True) df.GarageType.replace({'CarPort':'Detchd', 'No Garage':'Detchd', 'Basment':'Detchd' , '2Types':'Detchd'},inplace=True) df.LotShape.replace({'IR3':'IR2'},inplace=True) df.MSZoning.replace({'RH':'RM'},inplace=True) df.MasVnrType.replace({'None':'BrkCmn', 'Not present':'BrkCmn'},inplace=True) df.Neighborhood.replace({'BrDale':'MeadowV' , 'IDOTRR':'MeadowV' ,\ 'NAmes':'Sawyer' , 'NPkVill':'Sawyer' , 'Mitchel':'Sawyer' , 'SWISU':'Sawyer', 'Blueste':'Sawyer' ,\ 'Blmngtn':'Gilbert' , 'SawyerW':'Gilbert', 'NWAmes':'Gilbert',\ 'ClearCr':'Crawfor' , 'CollgCr' :'Crawfor',\ 'Timber':'Veenker', 'Somerst':'Veenker' ,\ 'Edwards':'OldTown', 'BrkSide':'OldTown' ,\ 'StoneBr' : 'NridgHt' , 'NoRidge': 'NridgHt'},inplace=True) df.SaleCondition.replace({'AdjLand':'Abnorml', 'Alloca':'Abnorml', 'Family' :'Abnorml'},inplace=True) df.SaleType.replace({'ConLD':'COD', 'ConLI':'COD', 'CwD':'COD', 'ConLw':'COD', 'Con':'COD', 'Oth':'COD'},inplace=True)
删掉不需要的feature
drop_columns = ['ExterCond', 'Fence', 'LotConfig' ,'RoofStyle' ,'Exterior1st'] df.drop(columns=drop_columns,inplace=True) for cat in drop_columns[:]: categorical_features.remove(cat)
处理时间序列特征
timeseries_features
['YearBuilt', 'YearRemodAdd', 'YrSold', 'MoSold', 'GarageYrBlt']
df[timeseries_features].info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1459 entries, 0 to 1458 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 YearBuilt 1459 non-null int64 1 YearRemodAdd 1459 non-null int64 2 YrSold 1459 non-null int64 3 MoSold 1459 non-null int64 4 GarageYrBlt 1459 non-null float64 dtypes: float64(1), int64(4) memory usage: 57.1 KB
将时间序列类别的数据转换成int类型
df.YrSold = df.YrSold.astype(int) df.GarageYrBlt = df.GarageYrBlt.astype(int)
df['dateSold'] = df['MoSold'].astype(str)+'-1-'+df['YrSold'].astype(str) df['dateSold'] =pd.to_datetime(df['dateSold']) timeseries_features.append('dateSold')
df['dateSold'].head()
0 2008-02-01 1 2007-05-01 2 2008-09-01 3 2006-02-01 4 2008-12-01 Name: dateSold, dtype: datetime64[ns]
df.loc[df.GarageYrBlt < 1900,['GarageYrBlt','YearBuilt']]
可视化数据
fig,ax = plt.subplots(math.ceil(len(timeseries_features)/2),2,figsize=(15,15),sharey=True) i ,j = 0, 0 for col in sorted(timeseries_features): if col == 'GarageYrBlt': sns.lineplot(df.loc[df[col] >= 1880,col],df.loc[df[col] != 0,'SalePrice'],ax=ax[i][j]) else: sns.lineplot(col,'SalePrice',data=df,ax=ax[i][j]) if df[col].nunique() > 8: ax[i][j].tick_params(axis='x',rotation=45) if col == "YrSold": ax[i][j].xaxis.set_ticks([2006,2007,2008,2009,2010]) if j == 1: j=0 i +=1 else: j +=1
df_dummy = df.pop('SalePrice') df.insert(df.shape[1],'SalePrice',df_dummy) del df_dummy
使用下面的heatmap来分析几个时间序列的feature之间的线性相关程度
plt.figure(figsize=(15,12)) sns.heatmap(df.corr(),annot=True);
可以看到MoSold和YrSold这两列和其他任何列几乎都没有什么强相关性,所以可以删除
df.drop(['MoSold','YrSold'],axis=1,inplace=True) for col in ['MoSold','YrSold']: timeseries_features.remove(col)
训练
给类别型的feature进行encoding,为后面的训练作准备
df[['HalfBath','Fireplaces','FullBath','BsmtFullBath','GarageCars','BedroomAbvGr','OverallCond','OverallQual']] = df[['HalfBath','Fireplaces','FullBath','BsmtFullBath','GarageCars','BedroomAbvGr','OverallCond','OverallQual']].astype(int)
categorical_columns =['ExterQual','BsmtQual','BsmtCond','HeatingQC','KitchenQual','FireplaceQu','GarageQual','HouseStyle','BsmtFinType2','BsmtFinType1','GarageFinish']
df['HouseStyle']=pd.Categorical(df['HouseStyle'],ordered=True,categories=[ 'SFoyer','1.5Unf','1Story','1.5Fin','SLvl','2.5Unf','2Story','2.5Fin'])
解释: 上面的操作是将类别型数据的值转化为categorical类型的数据
对类别型的feature进行one-hot编码
MSSubClass | MSZoning | LotFrontage | LotArea | LotShape | LandContour | Neighborhood | Condition1 | BldgType | HouseStyle | ... | BsmtExposure_NA | BsmtExposure_No | GarageType_BuiltIn | GarageType_Detchd | GarageType_NA | SaleType_CWD | SaleType_New | SaleType_WD | SaleCondition_Normal | SaleCondition_Partial | |
0 | 2story | RL | 65 | 8450 | Reg | Lvl | Crawfor | Norm | 1Fam | 6 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
1 | 1story | RL | 80 | 9600 | Reg | Lvl | Veenker | Artery | 1Fam | 2 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
2 | 2story | RL | 68 | 11250 | IR1 | Lvl | Crawfor | Norm | 1Fam | 6 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
3 | 2story | RL | 60 | 9550 | IR1 | Lvl | Crawfor | Norm | 1Fam | 6 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
4 | 2story | RL | 84 | 14260 | IR1 | Lvl | NridgHt | Norm | 1Fam | 6 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
1459 rows × 82 columns |
house_price.reset_index(drop=True,inplace=True)
处理日期列
总的方法就是将其转化为整型数值
数据切分
这里使用sklearn中的train_test_split来切分数据集为训练集和测试集
这个方法可以保证数据集的切分不破坏原有数据集中各自feature的概率分布情况,保证切分后的训练集和测试集各自的包含的相关变量的概率分布一致
归一化
LotFrontage | LotArea | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | MasVnrArea | ExterQual | BsmtQual | ... | BsmtExposure_NA | BsmtExposure_No | GarageType_BuiltIn | GarageType_Detchd | GarageType_NA | SaleType_CWD | SaleType_New | SaleType_WD | SaleCondition_Normal | SaleCondition_Partial | |
1453 | -0.39373 | -0.52222 | -0.74866 | 0.661837 | -0.48597 | 1.084801 | 0.986677 | -0.57304 | 1.064789 | 0.61969 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
1099 | 0.551285 | 0.340015 | -0.74866 | 0.661837 | -0.48597 | 0.216686 | -0.33434 | 0.640253 | -0.69435 | 0.61969 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
416 | 0.173278 | -0.4545 | 1.401497 | -0.06242 | 1.344947 | 0.216686 | -0.33434 | 0.622583 | -0.69435 | -0.66891 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
1168 | 2.346818 | 0.703806 | 1.401497 | -0.06242 | 1.344947 | -1.21904 | 0.057073 | -0.57304 | -0.69435 | -0.66891 | ... | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 |
670 | -0.29923 | -0.29918 | 1.401497 | -0.06242 | -0.48597 | 1.11819 | 0.986677 | -0.57304 | 1.064789 | 0.61969 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
5 rows × 81 columns |
构建回归模型
这里使用线性回归来拟合
衡量模型的效果(Evaluation)
结论:
- 训练数据集的R2 要大于测试数据集,说明模型有overfitting的可能,下面我们通过对数据集的改造来改善这一情况
特征选择
- 这里主要是找出互相有关联性的feature,然后选择其中一个最具有效信息的,其他的都删除的方式,类似于聚类
这里主要通过关联矩阵来计算和衡量
使用RFE进行特征选择
定义一个函数来构建普通的最小二乘模型
下面定义函数求方差膨胀因子
这个因子<5的情况是比较理想的,否则,说明当前模型存在比较严重的多重共线性。也就是说,当前模型的feature中有很多feature互相之间有比较密切的相关性,需要去掉一些feature
基于前面的构建最小二乘回归模型以及方差膨胀因子,我们按照下面的逻辑来构建一个自动的进行feature裁剪的方法:
- 当我们发现某些feature添加进来之后,回归模型的p值和VIF的值都升高了,那说明这个feature对我们的模型无益,需要drop掉
- 当我们发现feature加进来之后,回归模型的p值升高了,VIF降低了,那么将这些feature一个个drop,逐个检查
- 当我们发现feature加进来之后,回归模型的p值降低了,VIF升高了,那么将VIF高于5的feature去掉
- 当我们发现feature加进来之后,回归模型的p值降低了,VIF降低了,那说明这个feature是有益的feature,我们要保留
def perform_feature_selection(train_data,rfe=False,corr=False): X_train_rfe = pd.DataFrame() vif = pd.DataFrame() count = 1 stop = False # 回归模型前一次的r2值 r2score = 0.0 if rfe: cols = rfe_selected_columns elif corr: cols = corr_selected_columns.Corr_feature else: cols = corr_rfe_features.RFE.values for col in cols: if col in train_data.columns.values: X_train_rfe[col] = train_data[col] while True: lm = build_model(X_train_rfe) # 如果r2值完全不增加,那么将现在这个feature去掉 if round(r2score,3) == round(lm.rsquared,3): print("\n\n Dropping "+X_train_rfe.columns.values[-1]+" and rebuilding the model as it did not add any info to model \n\n") X_train_rfe.drop(X_train_rfe.columns.values[-1],axis=1, inplace=True) # 重新用剩余的feature来构建模型 lm = build_model(X_train_rfe) r2score = lm.rsquared if count != 1: vif = VIF(X_train_rfe) # 如果r2值超过90%,那么停止检查,认为模型达到预想效果,这个值可以设置 if lm.rsquared >= 0.90: stop = True break # 检查是不是p值升高了 if (lm.pvalues > 0.05).sum() > 0: feature = lm.pvalues[lm.pvalues > 0.05].index if feature[0] != 'const': # 如果VIF也升高了,那么drop if feature[0] in vif.loc[vif.VIF > 5,'Features']: X_train_rfe.drop(feature[0],axis=1,inplace=True) else: # 如果只有p值升高,drop X_train_rfe.drop(feature[0],axis=1,inplace=True) elif (feature[0] == 'const') & (len(feature) > 1): X_train_rfe.drop(feature[1],axis=1,inplace=True) if ((vif.VIF > 5).sum() > 0) & (col in X_train_rfe.columns.values): X_train_rfe.drop(col,axis=1,inplace=True) # order 3 else: break else: break if stop: break count = count + 1 return X_train_rfe
X_train_rfe = perform_feature_selection(X_train,corr=True)
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.717 Model: OLS Adj. R-squared: 0.714 Method: Least Squares F-statistic: 284.3 Date: Mon, 27 Jun 2022 Prob (F-statistic): 8.49e-270 Time: 19:38:14 Log-Likelihood: -12352. No. Observations: 1021 AIC: 2.472e+04 Df Residuals: 1011 BIC: 2.477e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.85e+05 4368.649 42.355 0.000 1.76e+05 1.94e+05 1stFlrSF 2.644e+04 1669.004 15.841 0.000 2.32e+04 2.97e+04 BldgType_Twnhs -1.101e+04 5464.297 -2.014 0.044 -2.17e+04 -282.459 BsmtExposure_NA 2.679e+04 1.11e+04 2.416 0.016 5033.785 4.85e+04 BsmtFinSF1 1.059e+04 1581.639 6.694 0.000 7484.070 1.37e+04 BsmtQual 2.001e+04 2362.866 8.468 0.000 1.54e+04 2.46e+04 ExterQual 2.335e+04 1962.316 11.899 0.000 1.95e+04 2.72e+04 Fireplaces 1.436e+04 1532.145 9.372 0.000 1.14e+04 1.74e+04 Foundation_CBlock -1.08e+04 4703.225 -2.296 0.022 -2e+04 -1571.394 Foundation_PConc 4150.7686 5422.286 0.766 0.444 -6489.454 1.48e+04 ============================================================================== Omnibus: 459.342 Durbin-Watson: 1.924 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4837.327 Skew: 1.781 Prob(JB): 0.00 Kurtosis: 13.051 Cond. No. 12.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping Foundation_PConc and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.717 Model: OLS Adj. R-squared: 0.714 Method: Least Squares F-statistic: 319.9 Date: Mon, 27 Jun 2022 Prob (F-statistic): 6.16e-271 Time: 19:38:14 Log-Likelihood: -12352. No. Observations: 1021 AIC: 2.472e+04 Df Residuals: 1012 BIC: 2.477e+04 Df Model: 8 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.88e+05 2053.300 91.553 0.000 1.84e+05 1.92e+05 1stFlrSF 2.643e+04 1668.635 15.840 0.000 2.32e+04 2.97e+04 BldgType_Twnhs -1.102e+04 5463.152 -2.017 0.044 -2.17e+04 -297.840 BsmtExposure_NA 2.674e+04 1.11e+04 2.412 0.016 4987.135 4.85e+04 BsmtFinSF1 1.063e+04 1580.214 6.729 0.000 7532.044 1.37e+04 BsmtQual 2.065e+04 2207.730 9.354 0.000 1.63e+04 2.5e+04 ExterQual 2.363e+04 1928.489 12.251 0.000 1.98e+04 2.74e+04 Fireplaces 1.43e+04 1529.991 9.347 0.000 1.13e+04 1.73e+04 Foundation_CBlock -1.333e+04 3343.343 -3.988 0.000 -1.99e+04 -6771.611 ============================================================================== Omnibus: 454.593 Durbin-Watson: 1.926 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4727.204 Skew: 1.762 Prob(JB): 0.00 Kurtosis: 12.935 Cond. No. 12.7 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.717 Model: OLS Adj. R-squared: 0.714 Method: Least Squares F-statistic: 284.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 1.10e-269 Time: 19:38:14 Log-Likelihood: -12352. No. Observations: 1021 AIC: 2.472e+04 Df Residuals: 1011 BIC: 2.477e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.879e+05 2080.552 90.316 0.000 1.84e+05 1.92e+05 1stFlrSF 2.641e+04 1671.162 15.805 0.000 2.31e+04 2.97e+04 BldgType_Twnhs -1.107e+04 5470.533 -2.024 0.043 -2.18e+04 -338.744 BsmtExposure_NA 2.437e+04 1.48e+04 1.644 0.100 -4717.249 5.35e+04 BsmtFinSF1 1.062e+04 1581.462 6.717 0.000 7519.903 1.37e+04 BsmtQual 2.072e+04 2225.665 9.308 0.000 1.63e+04 2.51e+04 ExterQual 2.363e+04 1929.399 12.246 0.000 1.98e+04 2.74e+04 Fireplaces 1.431e+04 1531.068 9.346 0.000 1.13e+04 1.73e+04 Foundation_CBlock -1.319e+04 3396.063 -3.884 0.000 -1.99e+04 -6526.682 Foundation_Stone 3473.8182 1.44e+04 0.241 0.810 -2.48e+04 3.18e+04 ============================================================================== Omnibus: 454.564 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4730.579 Skew: 1.761 Prob(JB): 0.00 Kurtosis: 12.939 Cond. No. 21.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping Foundation_Stone and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.717 Model: OLS Adj. R-squared: 0.714 Method: Least Squares F-statistic: 319.9 Date: Mon, 27 Jun 2022 Prob (F-statistic): 6.16e-271 Time: 19:38:14 Log-Likelihood: -12352. No. Observations: 1021 AIC: 2.472e+04 Df Residuals: 1012 BIC: 2.477e+04 Df Model: 8 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.88e+05 2053.300 91.553 0.000 1.84e+05 1.92e+05 1stFlrSF 2.643e+04 1668.635 15.840 0.000 2.32e+04 2.97e+04 BldgType_Twnhs -1.102e+04 5463.152 -2.017 0.044 -2.17e+04 -297.840 BsmtExposure_NA 2.674e+04 1.11e+04 2.412 0.016 4987.135 4.85e+04 BsmtFinSF1 1.063e+04 1580.214 6.729 0.000 7532.044 1.37e+04 BsmtQual 2.065e+04 2207.730 9.354 0.000 1.63e+04 2.5e+04 ExterQual 2.363e+04 1928.489 12.251 0.000 1.98e+04 2.74e+04 Fireplaces 1.43e+04 1529.991 9.347 0.000 1.13e+04 1.73e+04 Foundation_CBlock -1.333e+04 3343.343 -3.988 0.000 -1.99e+04 -6771.611 ============================================================================== Omnibus: 454.593 Durbin-Watson: 1.926 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4727.204 Skew: 1.762 Prob(JB): 0.00 Kurtosis: 12.935 Cond. No. 12.7 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.741 Model: OLS Adj. R-squared: 0.739 Method: Least Squares F-statistic: 321.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 3.27e-289 Time: 19:38:14 Log-Likelihood: -12307. No. Observations: 1021 AIC: 2.463e+04 Df Residuals: 1011 BIC: 2.468e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.871e+05 1966.956 95.103 0.000 1.83e+05 1.91e+05 1stFlrSF 2.27e+04 1642.064 13.827 0.000 1.95e+04 2.59e+04 BldgType_Twnhs -2.369e+04 5387.593 -4.397 0.000 -3.43e+04 -1.31e+04 BsmtExposure_NA 1.419e+04 1.07e+04 1.328 0.185 -6782.532 3.52e+04 BsmtFinSF1 1.297e+04 1531.055 8.473 0.000 9967.629 1.6e+04 BsmtQual 1.552e+04 2177.549 7.126 0.000 1.12e+04 1.98e+04 ExterQual 2.072e+04 1869.252 11.087 0.000 1.71e+04 2.44e+04 Fireplaces 1.254e+04 1475.068 8.505 0.000 9650.354 1.54e+04 Foundation_CBlock -8227.4604 3241.896 -2.538 0.011 -1.46e+04 -1865.845 FullBath 1.641e+04 1689.171 9.714 0.000 1.31e+04 1.97e+04 ============================================================================== Omnibus: 430.694 Durbin-Watson: 1.946 Prob(Omnibus): 0.000 Jarque-Bera (JB): 4257.081 Skew: 1.661 Prob(JB): 0.00 Kurtosis: 12.436 Cond. No. 13.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 320.4 Date: Mon, 27 Jun 2022 Prob (F-statistic): 4.98e-305 Time: 19:38:14 Log-Likelihood: -12267. No. Observations: 1021 AIC: 2.456e+04 Df Residuals: 1010 BIC: 2.461e+04 Df Model: 10 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.872e+05 1821.622 102.744 0.000 1.84e+05 1.91e+05 1stFlrSF 1.902e+04 1629.925 11.670 0.000 1.58e+04 2.22e+04 BldgType_Twnhs -2.128e+04 5162.883 -4.122 0.000 -3.14e+04 -1.11e+04 BsmtFinSF1 1.147e+04 1486.941 7.711 0.000 8548.411 1.44e+04 BsmtQual 1.227e+04 1758.908 6.976 0.000 8818.862 1.57e+04 ExterQual 1.757e+04 1824.971 9.629 0.000 1.4e+04 2.12e+04 Fireplaces 1.205e+04 1433.674 8.408 0.000 9240.387 1.49e+04 Foundation_CBlock -8052.8506 3040.517 -2.649 0.008 -1.4e+04 -2086.397 FullBath 1.457e+04 1641.700 8.874 0.000 1.13e+04 1.78e+04 GarageArea 1.513e+04 2809.559 5.384 0.000 9612.150 2.06e+04 GarageCars -202.8822 2857.603 -0.071 0.943 -5810.402 5404.637 ============================================================================== Omnibus: 491.859 Durbin-Watson: 1.934 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6508.560 Skew: 1.865 Prob(JB): 0.00 Kurtosis: 14.793 Cond. No. 8.13 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping GarageCars and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 356.3 Date: Mon, 27 Jun 2022 Prob (F-statistic): 2.57e-306 Time: 19:38:14 Log-Likelihood: -12267. No. Observations: 1021 AIC: 2.455e+04 Df Residuals: 1011 BIC: 2.460e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.872e+05 1819.709 102.850 0.000 1.84e+05 1.91e+05 1stFlrSF 1.902e+04 1628.786 11.679 0.000 1.58e+04 2.22e+04 BldgType_Twnhs -2.129e+04 5156.368 -4.130 0.000 -3.14e+04 -1.12e+04 BsmtFinSF1 1.147e+04 1481.601 7.745 0.000 8567.200 1.44e+04 BsmtQual 1.226e+04 1747.577 7.014 0.000 8827.495 1.57e+04 ExterQual 1.757e+04 1821.466 9.644 0.000 1.4e+04 2.11e+04 Fireplaces 1.204e+04 1419.676 8.481 0.000 9254.028 1.48e+04 Foundation_CBlock -8040.1455 3033.752 -2.650 0.008 -1.4e+04 -2086.974 FullBath 1.455e+04 1629.164 8.934 0.000 1.14e+04 1.78e+04 GarageArea 1.496e+04 1632.428 9.166 0.000 1.18e+04 1.82e+04 ============================================================================== Omnibus: 491.958 Durbin-Watson: 1.934 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6510.144 Skew: 1.866 Prob(JB): 0.00 Kurtosis: 14.794 Cond. No. 7.40 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 320.4 Date: Mon, 27 Jun 2022 Prob (F-statistic): 4.91e-305 Time: 19:38:14 Log-Likelihood: -12267. No. Observations: 1021 AIC: 2.456e+04 Df Residuals: 1010 BIC: 2.461e+04 Df Model: 10 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.871e+05 1821.423 102.747 0.000 1.84e+05 1.91e+05 1stFlrSF 1.904e+04 1631.010 11.671 0.000 1.58e+04 2.22e+04 BldgType_Twnhs -2.113e+04 5236.321 -4.034 0.000 -3.14e+04 -1.09e+04 BsmtFinSF1 1.146e+04 1483.410 7.728 0.000 8552.888 1.44e+04 BsmtQual 1.218e+04 1790.831 6.803 0.000 8669.676 1.57e+04 ExterQual 1.752e+04 1836.192 9.543 0.000 1.39e+04 2.11e+04 Fireplaces 1.2e+04 1435.173 8.362 0.000 9184.890 1.48e+04 Foundation_CBlock -8044.7975 3035.301 -2.650 0.008 -1.4e+04 -2088.579 FullBath 1.451e+04 1645.821 8.817 0.000 1.13e+04 1.77e+04 GarageArea 1.489e+04 1675.374 8.889 0.000 1.16e+04 1.82e+04 GarageFinish 312.0124 1657.567 0.188 0.851 -2940.656 3564.681 ============================================================================== Omnibus: 492.929 Durbin-Watson: 1.934 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6534.498 Skew: 1.870 Prob(JB): 0.00 Kurtosis: 14.816 Cond. No. 8.03 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping GarageFinish and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 356.3 Date: Mon, 27 Jun 2022 Prob (F-statistic): 2.57e-306 Time: 19:38:14 Log-Likelihood: -12267. No. Observations: 1021 AIC: 2.455e+04 Df Residuals: 1011 BIC: 2.460e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.872e+05 1819.709 102.850 0.000 1.84e+05 1.91e+05 1stFlrSF 1.902e+04 1628.786 11.679 0.000 1.58e+04 2.22e+04 BldgType_Twnhs -2.129e+04 5156.368 -4.130 0.000 -3.14e+04 -1.12e+04 BsmtFinSF1 1.147e+04 1481.601 7.745 0.000 8567.200 1.44e+04 BsmtQual 1.226e+04 1747.577 7.014 0.000 8827.495 1.57e+04 ExterQual 1.757e+04 1821.466 9.644 0.000 1.4e+04 2.11e+04 Fireplaces 1.204e+04 1419.676 8.481 0.000 9254.028 1.48e+04 Foundation_CBlock -8040.1455 3033.752 -2.650 0.008 -1.4e+04 -2086.974 FullBath 1.455e+04 1629.164 8.934 0.000 1.14e+04 1.78e+04 GarageArea 1.496e+04 1632.428 9.166 0.000 1.18e+04 1.82e+04 ============================================================================== Omnibus: 491.958 Durbin-Watson: 1.934 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6510.144 Skew: 1.866 Prob(JB): 0.00 Kurtosis: 14.794 Cond. No. 7.40 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 320.5 Date: Mon, 27 Jun 2022 Prob (F-statistic): 4.07e-305 Time: 19:38:14 Log-Likelihood: -12266. No. Observations: 1021 AIC: 2.455e+04 Df Residuals: 1010 BIC: 2.461e+04 Df Model: 10 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.878e+05 2106.657 89.164 0.000 1.84e+05 1.92e+05 1stFlrSF 1.888e+04 1643.702 11.489 0.000 1.57e+04 2.21e+04 BldgType_Twnhs -2.092e+04 5191.666 -4.029 0.000 -3.11e+04 -1.07e+04 BsmtFinSF1 1.142e+04 1484.888 7.688 0.000 8501.694 1.43e+04 BsmtQual 1.199e+04 1796.235 6.676 0.000 8466.998 1.55e+04 ExterQual 1.742e+04 1835.265 9.494 0.000 1.38e+04 2.1e+04 Fireplaces 1.198e+04 1423.240 8.417 0.000 9186.305 1.48e+04 Foundation_CBlock -8304.2324 3062.431 -2.712 0.007 -1.43e+04 -2294.777 FullBath 1.435e+04 1658.973 8.653 0.000 1.11e+04 1.76e+04 GarageArea 1.519e+04 1671.275 9.090 0.000 1.19e+04 1.85e+04 GarageType_Detchd -2092.7692 3262.275 -0.642 0.521 -8494.382 4308.843 ============================================================================== Omnibus: 497.557 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6683.851 Skew: 1.889 Prob(JB): 0.00 Kurtosis: 14.952 Cond. No. 7.52 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping GarageType_Detchd and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.760 Model: OLS Adj. R-squared: 0.758 Method: Least Squares F-statistic: 356.3 Date: Mon, 27 Jun 2022 Prob (F-statistic): 2.57e-306 Time: 19:38:14 Log-Likelihood: -12267. No. Observations: 1021 AIC: 2.455e+04 Df Residuals: 1011 BIC: 2.460e+04 Df Model: 9 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.872e+05 1819.709 102.850 0.000 1.84e+05 1.91e+05 1stFlrSF 1.902e+04 1628.786 11.679 0.000 1.58e+04 2.22e+04 BldgType_Twnhs -2.129e+04 5156.368 -4.130 0.000 -3.14e+04 -1.12e+04 BsmtFinSF1 1.147e+04 1481.601 7.745 0.000 8567.200 1.44e+04 BsmtQual 1.226e+04 1747.577 7.014 0.000 8827.495 1.57e+04 ExterQual 1.757e+04 1821.466 9.644 0.000 1.4e+04 2.11e+04 Fireplaces 1.204e+04 1419.676 8.481 0.000 9254.028 1.48e+04 Foundation_CBlock -8040.1455 3033.752 -2.650 0.008 -1.4e+04 -2086.974 FullBath 1.455e+04 1629.164 8.934 0.000 1.14e+04 1.78e+04 GarageArea 1.496e+04 1632.428 9.166 0.000 1.18e+04 1.82e+04 ============================================================================== Omnibus: 491.958 Durbin-Watson: 1.934 Prob(Omnibus): 0.000 Jarque-Bera (JB): 6510.144 Skew: 1.866 Prob(JB): 0.00 Kurtosis: 14.794 Cond. No. 7.40 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.815 Model: OLS Adj. R-squared: 0.813 Method: Least Squares F-statistic: 370.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12134. No. Observations: 1021 AIC: 2.429e+04 Df Residuals: 1008 BIC: 2.436e+04 Df Model: 12 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.848e+05 1640.059 112.701 0.000 1.82e+05 1.88e+05 1stFlrSF 1.226e+04 1492.986 8.211 0.000 9328.575 1.52e+04 BldgType_Twnhs -2.514e+04 4586.437 -5.481 0.000 -3.41e+04 -1.61e+04 BsmtFinSF1 1.304e+04 1309.221 9.958 0.000 1.05e+04 1.56e+04 BsmtQual 1.156e+04 1663.251 6.949 0.000 8293.985 1.48e+04 ExterQual 1.726e+04 1632.938 10.567 0.000 1.41e+04 2.05e+04 Fireplaces 5440.3950 1325.188 4.105 0.000 2839.951 8040.839 Foundation_CBlock -3801.6485 2688.584 -1.414 0.158 -9077.512 1474.214 FullBath 1259.5279 1673.974 0.752 0.452 -2025.344 4544.400 GarageArea 1.201e+04 1830.204 6.562 0.000 8418.298 1.56e+04 GarageType_NA 1.723e+04 6361.690 2.708 0.007 4743.491 2.97e+04 GarageYrBlt 2893.5852 1771.498 1.633 0.103 -582.660 6369.831 GrLivArea 2.936e+04 1780.188 16.491 0.000 2.59e+04 3.29e+04 ============================================================================== Omnibus: 530.944 Durbin-Watson: 1.907 Prob(Omnibus): 0.000 Jarque-Bera (JB): 10691.051 Skew: 1.920 Prob(JB): 0.00 Kurtosis: 18.380 Cond. No. 11.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.816 Model: OLS Adj. R-squared: 0.814 Method: Least Squares F-statistic: 372.8 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12131. No. Observations: 1021 AIC: 2.429e+04 Df Residuals: 1008 BIC: 2.435e+04 Df Model: 12 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.831e+05 1179.345 155.279 0.000 1.81e+05 1.85e+05 1stFlrSF 1.208e+04 1481.295 8.154 0.000 9171.233 1.5e+04 BldgType_Twnhs -2.404e+04 4594.339 -5.233 0.000 -3.31e+04 -1.5e+04 BsmtFinSF1 1.282e+04 1288.946 9.948 0.000 1.03e+04 1.54e+04 BsmtQual 1.17e+04 1652.529 7.083 0.000 8462.040 1.49e+04 ExterQual 1.66e+04 1635.857 10.148 0.000 1.34e+04 1.98e+04 Fireplaces 5276.4977 1317.607 4.005 0.000 2690.931 7862.064 FullBath 1333.7615 1659.828 0.804 0.422 -1923.353 4590.876 GarageArea 1.256e+04 1830.055 6.866 0.000 8973.594 1.62e+04 GarageType_NA 1.764e+04 6323.109 2.790 0.005 5231.647 3e+04 GarageYrBlt 1856.8163 1811.485 1.025 0.306 -1697.897 5411.530 GrLivArea 2.918e+04 1774.282 16.444 0.000 2.57e+04 3.27e+04 HeatingQC 3841.1963 1357.881 2.829 0.005 1176.599 6505.794 ============================================================================== Omnibus: 543.016 Durbin-Watson: 1.907 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11212.459 Skew: 1.974 Prob(JB): 0.00 Kurtosis: 18.747 Cond. No. 12.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.823 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 390.9 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12111. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1008 BIC: 2.431e+04 Df Model: 12 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.828e+05 1155.016 158.291 0.000 1.81e+05 1.85e+05 1stFlrSF 1.167e+04 1452.788 8.030 0.000 8815.484 1.45e+04 BldgType_Twnhs -2.091e+04 4441.758 -4.707 0.000 -2.96e+04 -1.22e+04 BsmtFinSF1 1.215e+04 1257.408 9.662 0.000 9681.473 1.46e+04 BsmtQual 1.135e+04 1618.144 7.016 0.000 8177.842 1.45e+04 ExterQual 1.164e+04 1785.197 6.523 0.000 8141.863 1.51e+04 Fireplaces 5298.9064 1291.188 4.104 0.000 2765.183 7832.630 GarageArea 1.181e+04 1795.779 6.575 0.000 8283.341 1.53e+04 GarageType_NA 1.909e+04 6201.302 3.078 0.002 6919.318 3.13e+04 GarageYrBlt 1348.1313 1720.113 0.784 0.433 -2027.282 4723.544 GrLivArea 2.889e+04 1508.571 19.150 0.000 2.59e+04 3.18e+04 HeatingQC 2224.0926 1355.931 1.640 0.101 -436.679 4884.864 KitchenQual 1.088e+04 1706.221 6.374 0.000 7526.962 1.42e+04 ============================================================================== Omnibus: 566.645 Durbin-Watson: 1.909 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12320.064 Skew: 2.079 Prob(JB): 0.00 Kurtosis: 19.502 Cond. No. 12.4 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.823 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 391.0 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12111. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1008 BIC: 2.431e+04 Df Model: 12 Covariance Type: nonrobust =================================================================================== coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------- const 1.826e+05 1167.634 156.427 0.000 1.8e+05 1.85e+05 1stFlrSF 1.161e+04 1452.283 7.997 0.000 8764.102 1.45e+04 BldgType_Twnhs -2.034e+04 4392.653 -4.631 0.000 -2.9e+04 -1.17e+04 BsmtFinSF1 1.196e+04 1259.989 9.494 0.000 9489.451 1.44e+04 BsmtQual 1.189e+04 1485.640 8.006 0.000 8978.934 1.48e+04 ExterQual 1.192e+04 1779.074 6.703 0.000 8433.137 1.54e+04 Fireplaces 5202.4712 1290.361 4.032 0.000 2670.369 7734.574 GarageArea 1.227e+04 1665.929 7.365 0.000 9000.717 1.55e+04 GarageType_NA 1.866e+04 6202.724 3.009 0.003 6490.813 3.08e+04 GrLivArea 2.869e+04 1477.181 19.421 0.000 2.58e+04 3.16e+04 HeatingQC 2511.1145 1321.773 1.900 0.058 -82.628 5104.857 KitchenQual 1.091e+04 1702.586 6.409 0.000 7570.148 1.43e+04 LandContour_Low 5776.3072 6735.792 0.858 0.391 -7441.473 1.9e+04 ============================================================================== Omnibus: 564.056 Durbin-Watson: 1.911 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12299.304 Skew: 2.065 Prob(JB): 0.00 Kurtosis: 19.494 Cond. No. 12.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping LandContour_Low and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.823 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 426.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12112. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1009 BIC: 2.431e+04 Df Model: 11 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.828e+05 1154.193 158.379 0.000 1.81e+05 1.85e+05 1stFlrSF 1.163e+04 1451.911 8.013 0.000 8784.516 1.45e+04 BldgType_Twnhs -2.039e+04 4391.723 -4.643 0.000 -2.9e+04 -1.18e+04 BsmtFinSF1 1.207e+04 1253.306 9.632 0.000 9612.336 1.45e+04 BsmtQual 1.186e+04 1484.803 7.985 0.000 8943.121 1.48e+04 ExterQual 1.18e+04 1773.301 6.656 0.000 8324.156 1.53e+04 Fireplaces 5246.1628 1289.186 4.069 0.000 2716.370 7775.956 GarageArea 1.234e+04 1663.923 7.414 0.000 9070.826 1.56e+04 GarageType_NA 1.89e+04 6195.591 3.051 0.002 6744.886 3.11e+04 GrLivArea 2.865e+04 1476.152 19.406 0.000 2.57e+04 3.15e+04 HeatingQC 2464.6328 1320.488 1.866 0.062 -126.585 5055.850 KitchenQual 1.098e+04 1700.260 6.460 0.000 7647.272 1.43e+04 ============================================================================== Omnibus: 562.418 Durbin-Watson: 1.910 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12156.778 Skew: 2.059 Prob(JB): 0.00 Kurtosis: 19.395 Cond. No. 11.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.823 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 425.0 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12113. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1009 BIC: 2.431e+04 Df Model: 11 Covariance Type: nonrobust =================================================================================== coef std err t P>|t| [0.025 0.975] ----------------------------------------------------------------------------------- const 1.851e+05 3536.273 52.341 0.000 1.78e+05 1.92e+05 1stFlrSF 1.156e+04 1453.851 7.954 0.000 8710.631 1.44e+04 BldgType_Twnhs -2.104e+04 4391.370 -4.790 0.000 -2.97e+04 -1.24e+04 BsmtFinSF1 1.191e+04 1253.222 9.507 0.000 9455.558 1.44e+04 BsmtQual 1.216e+04 1478.582 8.221 0.000 9254.430 1.51e+04 ExterQual 1.255e+04 1758.022 7.140 0.000 9101.715 1.6e+04 Fireplaces 5174.6946 1293.904 3.999 0.000 2635.643 7713.746 GarageArea 1.229e+04 1666.245 7.374 0.000 9016.493 1.56e+04 GarageType_NA 1.878e+04 6220.943 3.020 0.003 6576.901 3.1e+04 GrLivArea 2.876e+04 1476.951 19.471 0.000 2.59e+04 3.17e+04 KitchenQual 1.163e+04 1663.418 6.992 0.000 8365.761 1.49e+04 LandContour_Lvl -2487.6851 3676.620 -0.677 0.499 -9702.383 4727.013 ============================================================================== Omnibus: 552.749 Durbin-Watson: 1.907 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11824.863 Skew: 2.012 Prob(JB): 0.00 Kurtosis: 19.179 Cond. No. 11.5 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping LandContour_Lvl and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.822 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 467.8 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:14 Log-Likelihood: -12114. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1010 BIC: 2.430e+04 Df Model: 10 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.828e+05 1155.495 158.227 0.000 1.81e+05 1.85e+05 1stFlrSF 1.158e+04 1453.358 7.964 0.000 8723.290 1.44e+04 BldgType_Twnhs -2.093e+04 4387.555 -4.771 0.000 -2.95e+04 -1.23e+04 BsmtFinSF1 1.193e+04 1252.616 9.526 0.000 9474.314 1.44e+04 BsmtQual 1.215e+04 1478.174 8.221 0.000 9251.256 1.51e+04 ExterQual 1.241e+04 1745.318 7.111 0.000 8986.552 1.58e+04 Fireplaces 5232.3481 1290.749 4.054 0.000 2699.491 7765.205 GarageArea 1.229e+04 1665.787 7.378 0.000 9021.448 1.56e+04 GarageType_NA 1.909e+04 6202.348 3.079 0.002 6923.724 3.13e+04 GrLivArea 2.877e+04 1476.374 19.490 0.000 2.59e+04 3.17e+04 KitchenQual 1.167e+04 1661.791 7.024 0.000 8411.351 1.49e+04 ============================================================================== Omnibus: 552.339 Durbin-Watson: 1.908 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11742.839 Skew: 2.012 Prob(JB): 0.00 Kurtosis: 19.119 Cond. No. 11.3 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.823 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 425.5 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12113. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1009 BIC: 2.431e+04 Df Model: 11 Covariance Type: nonrobust ===================================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 1.831e+05 1176.600 155.609 0.000 1.81e+05 1.85e+05 1stFlrSF 1.173e+04 1458.983 8.038 0.000 8864.126 1.46e+04 BldgType_Twnhs -1.755e+04 5264.740 -3.334 0.001 -2.79e+04 -7221.682 BsmtFinSF1 1.194e+04 1252.427 9.535 0.000 9484.156 1.44e+04 BsmtQual 1.22e+04 1478.435 8.250 0.000 9296.091 1.51e+04 ExterQual 1.218e+04 1755.883 6.940 0.000 8739.368 1.56e+04 Fireplaces 5258.0746 1290.717 4.074 0.000 2725.278 7790.871 GarageArea 1.243e+04 1669.787 7.444 0.000 9152.437 1.57e+04 GarageType_NA 1.954e+04 6213.037 3.145 0.002 7346.307 3.17e+04 GrLivArea 2.866e+04 1479.146 19.379 0.000 2.58e+04 3.16e+04 KitchenQual 1.157e+04 1663.967 6.952 0.000 8302.023 1.48e+04 MSSubClass_nstory -5284.1947 4551.241 -1.161 0.246 -1.42e+04 3646.787 ============================================================================== Omnibus: 552.108 Durbin-Watson: 1.908 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11753.833 Skew: 2.011 Prob(JB): 0.00 Kurtosis: 19.128 Cond. No. 11.4 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping MSSubClass_nstory and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.822 Model: OLS Adj. R-squared: 0.821 Method: Least Squares F-statistic: 467.8 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12114. No. Observations: 1021 AIC: 2.425e+04 Df Residuals: 1010 BIC: 2.430e+04 Df Model: 10 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.828e+05 1155.495 158.227 0.000 1.81e+05 1.85e+05 1stFlrSF 1.158e+04 1453.358 7.964 0.000 8723.290 1.44e+04 BldgType_Twnhs -2.093e+04 4387.555 -4.771 0.000 -2.95e+04 -1.23e+04 BsmtFinSF1 1.193e+04 1252.616 9.526 0.000 9474.314 1.44e+04 BsmtQual 1.215e+04 1478.174 8.221 0.000 9251.256 1.51e+04 ExterQual 1.241e+04 1745.318 7.111 0.000 8986.552 1.58e+04 Fireplaces 5232.3481 1290.749 4.054 0.000 2699.491 7765.205 GarageArea 1.229e+04 1665.787 7.378 0.000 9021.448 1.56e+04 GarageType_NA 1.909e+04 6202.348 3.079 0.002 6923.724 3.13e+04 GrLivArea 2.877e+04 1476.374 19.490 0.000 2.59e+04 3.17e+04 KitchenQual 1.167e+04 1661.791 7.024 0.000 8411.351 1.49e+04 ============================================================================== Omnibus: 552.339 Durbin-Watson: 1.908 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11742.839 Skew: 2.012 Prob(JB): 0.00 Kurtosis: 19.119 Cond. No. 11.3 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.831 Model: OLS Adj. R-squared: 0.829 Method: Least Squares F-statistic: 413.7 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12088. No. Observations: 1021 AIC: 2.420e+04 Df Residuals: 1008 BIC: 2.427e+04 Df Model: 12 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.842e+05 1214.413 151.664 0.000 1.82e+05 1.87e+05 1stFlrSF 1.049e+04 1436.688 7.299 0.000 7666.570 1.33e+04 BldgType_Twnhs -2.059e+04 4306.482 -4.780 0.000 -2.9e+04 -1.21e+04 BsmtFinSF1 1.074e+04 1233.568 8.708 0.000 8321.547 1.32e+04 BsmtQual 1.113e+04 1449.648 7.677 0.000 8283.843 1.4e+04 ExterQual 1.079e+04 1717.956 6.279 0.000 7416.062 1.42e+04 Fireplaces 4590.8153 1263.466 3.634 0.000 2111.491 7070.140 GarageArea 1.055e+04 1643.302 6.421 0.000 7327.118 1.38e+04 GarageType_NA 1.648e+04 6063.698 2.717 0.007 4577.616 2.84e+04 GrLivArea 2.791e+04 1450.986 19.235 0.000 2.51e+04 3.08e+04 KitchenQual 1.233e+04 1624.851 7.588 0.000 9140.322 1.55e+04 MSZoning_RM -8266.9289 3145.256 -2.628 0.009 -1.44e+04 -2094.929 MasVnrArea 8321.3261 1222.519 6.807 0.000 5922.353 1.07e+04 ============================================================================== Omnibus: 548.666 Durbin-Watson: 1.913 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12102.892 Skew: 1.981 Prob(JB): 0.00 Kurtosis: 19.395 Cond. No. 11.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.831 Model: OLS Adj. R-squared: 0.829 Method: Least Squares F-statistic: 381.5 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12088. No. Observations: 1021 AIC: 2.420e+04 Df Residuals: 1007 BIC: 2.427e+04 Df Model: 13 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.842e+05 1215.819 151.500 0.000 1.82e+05 1.87e+05 1stFlrSF 1.045e+04 1441.819 7.250 0.000 7623.761 1.33e+04 BldgType_Twnhs -2.056e+04 4309.498 -4.770 0.000 -2.9e+04 -1.21e+04 BsmtFinSF1 1.074e+04 1234.155 8.702 0.000 8318.119 1.32e+04 BsmtQual 1.115e+04 1451.919 7.678 0.000 8299.103 1.4e+04 ExterQual 1.077e+04 1720.226 6.259 0.000 7390.981 1.41e+04 Fireplaces 4603.0699 1264.755 3.639 0.000 2121.213 7084.927 GarageArea 1.054e+04 1644.471 6.410 0.000 7314.086 1.38e+04 GarageType_NA 1.652e+04 6068.637 2.723 0.007 4614.807 2.84e+04 GrLivArea 2.789e+04 1452.752 19.200 0.000 2.5e+04 3.07e+04 KitchenQual 1.232e+04 1626.018 7.576 0.000 9127.270 1.55e+04 MSZoning_RM -7926.2417 3361.189 -2.358 0.019 -1.45e+04 -1330.506 MasVnrArea 8348.5114 1226.703 6.806 0.000 5941.324 1.08e+04 Neighborhood_MeadowV -1720.3806 5966.275 -0.288 0.773 -1.34e+04 9987.376 ============================================================================== Omnibus: 549.424 Durbin-Watson: 1.914 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12140.657 Skew: 1.984 Prob(JB): 0.00 Kurtosis: 19.421 Cond. No. 11.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping Neighborhood_MeadowV and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.831 Model: OLS Adj. R-squared: 0.829 Method: Least Squares F-statistic: 413.7 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12088. No. Observations: 1021 AIC: 2.420e+04 Df Residuals: 1008 BIC: 2.427e+04 Df Model: 12 Covariance Type: nonrobust ================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------- const 1.842e+05 1214.413 151.664 0.000 1.82e+05 1.87e+05 1stFlrSF 1.049e+04 1436.688 7.299 0.000 7666.570 1.33e+04 BldgType_Twnhs -2.059e+04 4306.482 -4.780 0.000 -2.9e+04 -1.21e+04 BsmtFinSF1 1.074e+04 1233.568 8.708 0.000 8321.547 1.32e+04 BsmtQual 1.113e+04 1449.648 7.677 0.000 8283.843 1.4e+04 ExterQual 1.079e+04 1717.956 6.279 0.000 7416.062 1.42e+04 Fireplaces 4590.8153 1263.466 3.634 0.000 2111.491 7070.140 GarageArea 1.055e+04 1643.302 6.421 0.000 7327.118 1.38e+04 GarageType_NA 1.648e+04 6063.698 2.717 0.007 4577.616 2.84e+04 GrLivArea 2.791e+04 1450.986 19.235 0.000 2.51e+04 3.08e+04 KitchenQual 1.233e+04 1624.851 7.588 0.000 9140.322 1.55e+04 MSZoning_RM -8266.9289 3145.256 -2.628 0.009 -1.44e+04 -2094.929 MasVnrArea 8321.3261 1222.519 6.807 0.000 5922.353 1.07e+04 ============================================================================== Omnibus: 548.666 Durbin-Watson: 1.913 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12102.892 Skew: 1.981 Prob(JB): 0.00 Kurtosis: 19.395 Cond. No. 11.8 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.845 Model: OLS Adj. R-squared: 0.843 Method: Least Squares F-statistic: 422.7 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12044. No. Observations: 1021 AIC: 2.412e+04 Df Residuals: 1007 BIC: 2.418e+04 Df Model: 13 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.806e+05 1222.382 147.769 0.000 1.78e+05 1.83e+05 1stFlrSF 9881.8312 1378.378 7.169 0.000 7177.008 1.26e+04 BldgType_Twnhs -2.155e+04 4128.566 -5.221 0.000 -2.97e+04 -1.35e+04 BsmtFinSF1 9574.9772 1188.600 8.056 0.000 7242.560 1.19e+04 BsmtQual 1.055e+04 1390.655 7.588 0.000 7823.465 1.33e+04 ExterQual 7906.6368 1674.117 4.723 0.000 4621.479 1.12e+04 Fireplaces 4744.9680 1211.007 3.918 0.000 2368.582 7121.354 GarageArea 9527.1043 1578.614 6.035 0.000 6429.354 1.26e+04 GarageType_NA 1.171e+04 5832.972 2.008 0.045 266.134 2.32e+04 GrLivArea 2.686e+04 1395.024 19.252 0.000 2.41e+04 2.96e+04 KitchenQual 1.168e+04 1558.728 7.495 0.000 8624.346 1.47e+04 MSZoning_RM -8820.3963 3014.958 -2.926 0.004 -1.47e+04 -2904.077 MasVnrArea 5725.4837 1203.038 4.759 0.000 3364.735 8086.232 Neighborhood_NridgHt 4.014e+04 4221.716 9.509 0.000 3.19e+04 4.84e+04 ============================================================================== Omnibus: 525.962 Durbin-Watson: 1.891 Prob(Omnibus): 0.000 Jarque-Bera (JB): 10975.683 Skew: 1.883 Prob(JB): 0.00 Kurtosis: 18.615 Cond. No. 11.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.845 Model: OLS Adj. R-squared: 0.843 Method: Least Squares F-statistic: 392.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12044. No. Observations: 1021 AIC: 2.412e+04 Df Residuals: 1006 BIC: 2.419e+04 Df Model: 14 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.806e+05 1265.162 142.785 0.000 1.78e+05 1.83e+05 1stFlrSF 9881.2547 1379.115 7.165 0.000 7174.983 1.26e+04 BldgType_Twnhs -2.156e+04 4136.110 -5.214 0.000 -2.97e+04 -1.34e+04 BsmtFinSF1 9573.2151 1189.767 8.046 0.000 7238.506 1.19e+04 BsmtQual 1.054e+04 1412.453 7.463 0.000 7769.120 1.33e+04 ExterQual 7906.8522 1674.953 4.721 0.000 4620.050 1.12e+04 Fireplaces 4740.5740 1215.126 3.901 0.000 2356.103 7125.045 GarageArea 9527.0936 1579.397 6.032 0.000 6427.804 1.26e+04 GarageType_NA 1.176e+04 5904.833 1.991 0.047 167.897 2.33e+04 GrLivArea 2.686e+04 1396.517 19.233 0.000 2.41e+04 2.96e+04 KitchenQual 1.168e+04 1559.841 7.489 0.000 8620.607 1.47e+04 MSZoning_RM -8756.1191 3305.466 -2.649 0.008 -1.52e+04 -2269.721 MasVnrArea 5721.3783 1206.727 4.741 0.000 3353.388 8089.368 Neighborhood_NridgHt 4.015e+04 4227.505 9.498 0.000 3.19e+04 4.84e+04 Neighborhood_OldTown -152.0459 3197.432 -0.048 0.962 -6426.446 6122.354 ============================================================================== Omnibus: 526.126 Durbin-Watson: 1.891 Prob(Omnibus): 0.000 Jarque-Bera (JB): 10982.360 Skew: 1.884 Prob(JB): 0.00 Kurtosis: 18.619 Cond. No. 12.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping Neighborhood_OldTown and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.845 Model: OLS Adj. R-squared: 0.843 Method: Least Squares F-statistic: 422.7 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12044. No. Observations: 1021 AIC: 2.412e+04 Df Residuals: 1007 BIC: 2.418e+04 Df Model: 13 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.806e+05 1222.382 147.769 0.000 1.78e+05 1.83e+05 1stFlrSF 9881.8312 1378.378 7.169 0.000 7177.008 1.26e+04 BldgType_Twnhs -2.155e+04 4128.566 -5.221 0.000 -2.97e+04 -1.35e+04 BsmtFinSF1 9574.9772 1188.600 8.056 0.000 7242.560 1.19e+04 BsmtQual 1.055e+04 1390.655 7.588 0.000 7823.465 1.33e+04 ExterQual 7906.6368 1674.117 4.723 0.000 4621.479 1.12e+04 Fireplaces 4744.9680 1211.007 3.918 0.000 2368.582 7121.354 GarageArea 9527.1043 1578.614 6.035 0.000 6429.354 1.26e+04 GarageType_NA 1.171e+04 5832.972 2.008 0.045 266.134 2.32e+04 GrLivArea 2.686e+04 1395.024 19.252 0.000 2.41e+04 2.96e+04 KitchenQual 1.168e+04 1558.728 7.495 0.000 8624.346 1.47e+04 MSZoning_RM -8820.3963 3014.958 -2.926 0.004 -1.47e+04 -2904.077 MasVnrArea 5725.4837 1203.038 4.759 0.000 3364.735 8086.232 Neighborhood_NridgHt 4.014e+04 4221.716 9.509 0.000 3.19e+04 4.84e+04 ============================================================================== Omnibus: 525.962 Durbin-Watson: 1.891 Prob(Omnibus): 0.000 Jarque-Bera (JB): 10975.683 Skew: 1.883 Prob(JB): 0.00 Kurtosis: 18.615 Cond. No. 11.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.846 Model: OLS Adj. R-squared: 0.844 Method: Least Squares F-statistic: 395.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12040. No. Observations: 1021 AIC: 2.411e+04 Df Residuals: 1006 BIC: 2.418e+04 Df Model: 14 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.829e+05 1513.711 120.850 0.000 1.8e+05 1.86e+05 1stFlrSF 1.029e+04 1383.847 7.437 0.000 7576.255 1.3e+04 BldgType_Twnhs -2.107e+04 4121.552 -5.112 0.000 -2.92e+04 -1.3e+04 BsmtFinSF1 9951.9509 1194.407 8.332 0.000 7608.136 1.23e+04 BsmtQual 9916.1021 1408.849 7.038 0.000 7151.484 1.27e+04 ExterQual 7337.8281 1684.176 4.357 0.000 4032.927 1.06e+04 Fireplaces 4668.1640 1208.040 3.864 0.000 2297.597 7038.732 GarageArea 9294.9449 1576.864 5.895 0.000 6200.625 1.24e+04 GarageType_NA 1.015e+04 5848.615 1.736 0.083 -1324.329 2.16e+04 GrLivArea 2.643e+04 1401.140 18.862 0.000 2.37e+04 2.92e+04 KitchenQual 1.141e+04 1558.182 7.320 0.000 8348.381 1.45e+04 MSZoning_RM -1.116e+04 3142.068 -3.552 0.000 -1.73e+04 -4994.366 MasVnrArea 5811.4161 1200.190 4.842 0.000 3456.253 8166.579 Neighborhood_NridgHt 3.969e+04 4213.738 9.420 0.000 3.14e+04 4.8e+04 Neighborhood_Sawyer -6846.2544 2670.071 -2.564 0.010 -1.21e+04 -1606.708 ============================================================================== Omnibus: 533.433 Durbin-Watson: 1.888 Prob(Omnibus): 0.000 Jarque-Bera (JB): 11307.461 Skew: 1.916 Prob(JB): 0.00 Kurtosis: 18.847 Cond. No. 12.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.851 Model: OLS Adj. R-squared: 0.849 Method: Least Squares F-statistic: 410.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -12024. No. Observations: 1021 AIC: 2.408e+04 Df Residuals: 1006 BIC: 2.415e+04 Df Model: 14 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1461.041 125.845 0.000 1.81e+05 1.87e+05 1stFlrSF 1.131e+04 1357.945 8.326 0.000 8641.543 1.4e+04 BldgType_Twnhs -1.643e+04 4066.064 -4.041 0.000 -2.44e+04 -8450.831 BsmtFinSF1 9593.0812 1176.603 8.153 0.000 7284.205 1.19e+04 BsmtQual 1.149e+04 1403.250 8.186 0.000 8732.765 1.42e+04 ExterQual 7568.3900 1657.339 4.567 0.000 4316.153 1.08e+04 Fireplaces 4257.3628 1179.779 3.609 0.000 1942.252 6572.473 GarageArea 7987.8508 1330.353 6.004 0.000 5377.266 1.06e+04 GrLivArea 2.604e+04 1377.421 18.904 0.000 2.33e+04 2.87e+04 KitchenQual 9963.1232 1547.538 6.438 0.000 6926.351 1.3e+04 MSZoning_RM -1.356e+04 3107.452 -4.363 0.000 -1.97e+04 -7460.780 MasVnrArea 6115.6579 1180.424 5.181 0.000 3799.282 8432.034 Neighborhood_NridgHt 4.175e+04 4138.830 10.086 0.000 3.36e+04 4.99e+04 Neighborhood_Sawyer -9172.8702 2630.289 -3.487 0.001 -1.43e+04 -4011.388 OverallCond 6370.9537 1053.863 6.045 0.000 4302.933 8438.975 ============================================================================== Omnibus: 554.023 Durbin-Watson: 1.916 Prob(Omnibus): 0.000 Jarque-Bera (JB): 12432.846 Skew: 2.003 Prob(JB): 0.00 Kurtosis: 19.620 Cond. No. 8.64 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 408.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11996. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1005 BIC: 2.410e+04 Df Model: 15 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1422.438 129.299 0.000 1.81e+05 1.87e+05 1stFlrSF 1.074e+04 1324.195 8.111 0.000 8141.768 1.33e+04 BldgType_Twnhs -1.634e+04 3958.597 -4.127 0.000 -2.41e+04 -8568.302 BsmtFinSF1 1.061e+04 1153.541 9.201 0.000 8350.347 1.29e+04 BsmtQual 7590.3193 1461.385 5.194 0.000 4722.603 1.05e+04 ExterQual 4529.7459 1663.509 2.723 0.007 1265.398 7794.094 Fireplaces 2869.0131 1163.381 2.466 0.014 586.078 5151.948 GarageArea 6728.2548 1306.005 5.152 0.000 4165.445 9291.064 GrLivArea 2.386e+04 1372.160 17.386 0.000 2.12e+04 2.65e+04 KitchenQual 7764.5543 1534.821 5.059 0.000 4752.733 1.08e+04 MSZoning_RM -1.327e+04 3025.556 -4.385 0.000 -1.92e+04 -7329.794 MasVnrArea 5247.4047 1155.023 4.543 0.000 2980.871 7513.939 Neighborhood_NridgHt 3.811e+04 4058.436 9.390 0.000 3.01e+04 4.61e+04 Neighborhood_Sawyer -8254.2575 2563.679 -3.220 0.001 -1.33e+04 -3223.480 OverallCond 5545.1589 1031.882 5.374 0.000 3520.268 7570.050 OverallQual 1.372e+04 1827.178 7.508 0.000 1.01e+04 1.73e+04 ============================================================================== Omnibus: 578.918 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13924.298 Skew: 2.109 Prob(JB): 0.00 Kurtosis: 20.593 Cond. No. 9.39 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.860 Model: OLS Adj. R-squared: 0.858 Method: Least Squares F-statistic: 385.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11992. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1004 BIC: 2.410e+04 Df Model: 16 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.9e+05 2610.116 72.786 0.000 1.85e+05 1.95e+05 1stFlrSF 1.037e+04 1326.717 7.814 0.000 7763.565 1.3e+04 BldgType_Twnhs -1.686e+04 3950.067 -4.268 0.000 -2.46e+04 -9106.475 BsmtFinSF1 1.075e+04 1150.781 9.341 0.000 8490.918 1.3e+04 BsmtQual 7394.1167 1458.301 5.070 0.000 4532.450 1.03e+04 ExterQual 4461.8919 1658.214 2.691 0.007 1207.930 7715.853 Fireplaces 3182.4202 1165.074 2.732 0.006 896.160 5468.680 GarageArea 6741.8245 1301.715 5.179 0.000 4187.431 9296.218 GrLivArea 2.402e+04 1368.845 17.544 0.000 2.13e+04 2.67e+04 KitchenQual 7325.0898 1537.998 4.763 0.000 4307.031 1.03e+04 MSZoning_RM -1.323e+04 3015.633 -4.386 0.000 -1.91e+04 -7307.567 MasVnrArea 5233.8638 1151.231 4.546 0.000 2974.769 7492.959 Neighborhood_NridgHt 3.79e+04 4045.780 9.368 0.000 3e+04 4.58e+04 Neighborhood_Sawyer -8268.5869 2555.244 -3.236 0.001 -1.33e+04 -3254.355 OverallCond 5927.3213 1037.725 5.712 0.000 3890.963 7963.679 OverallQual 1.368e+04 1821.206 7.514 0.000 1.01e+04 1.73e+04 SaleCondition_Normal -7316.4511 2645.254 -2.766 0.006 -1.25e+04 -2125.590 ============================================================================== Omnibus: 562.479 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13592.318 Skew: 2.022 Prob(JB): 0.00 Kurtosis: 20.411 Cond. No. 9.48 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.862 Model: OLS Adj. R-squared: 0.860 Method: Least Squares F-statistic: 368.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11985. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1003 BIC: 2.409e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.96e+05 3040.078 64.474 0.000 1.9e+05 2.02e+05 1stFlrSF 1.003e+04 1320.868 7.596 0.000 7441.776 1.26e+04 BldgType_Twnhs -1.731e+04 3925.793 -4.410 0.000 -2.5e+04 -9610.000 BsmtFinSF1 1.112e+04 1147.441 9.695 0.000 8872.698 1.34e+04 BsmtQual 6980.6942 1452.750 4.805 0.000 4129.917 9831.471 ExterQual 3917.5315 1653.484 2.369 0.018 672.848 7162.216 Fireplaces 3116.8400 1157.502 2.693 0.007 845.437 5388.243 GarageArea 6393.2036 1296.368 4.932 0.000 3849.300 8937.108 GrLivArea 2.445e+04 1364.598 17.917 0.000 2.18e+04 2.71e+04 KitchenQual 7401.9047 1527.965 4.844 0.000 4403.530 1.04e+04 MSZoning_RM -1.302e+04 2996.196 -4.345 0.000 -1.89e+04 -7138.534 MasVnrArea 5157.6435 1143.797 4.509 0.000 2913.133 7402.154 Neighborhood_NridgHt 3.776e+04 4019.204 9.395 0.000 2.99e+04 4.56e+04 Neighborhood_Sawyer -7983.7098 2539.462 -3.144 0.002 -1.3e+04 -3000.441 OverallCond 5997.5051 1031.031 5.817 0.000 3974.280 8020.730 OverallQual 1.378e+04 1809.346 7.616 0.000 1.02e+04 1.73e+04 SaleCondition_Normal 585.7543 3352.560 0.175 0.861 -5993.081 7164.589 SaleType_WD -1.455e+04 3834.265 -3.796 0.000 -2.21e+04 -7029.197 ============================================================================== Omnibus: 585.151 Durbin-Watson: 1.932 Prob(Omnibus): 0.000 Jarque-Bera (JB): 16057.205 Skew: 2.097 Prob(JB): 0.00 Kurtosis: 21.970 Cond. No. 10.9 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 408.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11996. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1005 BIC: 2.410e+04 Df Model: 15 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1422.438 129.299 0.000 1.81e+05 1.87e+05 1stFlrSF 1.074e+04 1324.195 8.111 0.000 8141.768 1.33e+04 BldgType_Twnhs -1.634e+04 3958.597 -4.127 0.000 -2.41e+04 -8568.302 BsmtFinSF1 1.061e+04 1153.541 9.201 0.000 8350.347 1.29e+04 BsmtQual 7590.3193 1461.385 5.194 0.000 4722.603 1.05e+04 ExterQual 4529.7459 1663.509 2.723 0.007 1265.398 7794.094 Fireplaces 2869.0131 1163.381 2.466 0.014 586.078 5151.948 GarageArea 6728.2548 1306.005 5.152 0.000 4165.445 9291.064 GrLivArea 2.386e+04 1372.160 17.386 0.000 2.12e+04 2.65e+04 KitchenQual 7764.5543 1534.821 5.059 0.000 4752.733 1.08e+04 MSZoning_RM -1.327e+04 3025.556 -4.385 0.000 -1.92e+04 -7329.794 MasVnrArea 5247.4047 1155.023 4.543 0.000 2980.871 7513.939 Neighborhood_NridgHt 3.811e+04 4058.436 9.390 0.000 3.01e+04 4.61e+04 Neighborhood_Sawyer -8254.2575 2563.679 -3.220 0.001 -1.33e+04 -3223.480 OverallCond 5545.1589 1031.882 5.374 0.000 3520.268 7570.050 OverallQual 1.372e+04 1827.178 7.508 0.000 1.01e+04 1.73e+04 ============================================================================== Omnibus: 578.918 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13924.298 Skew: 2.109 Prob(JB): 0.00 Kurtosis: 20.593 Cond. No. 9.39 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 383.2 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11995. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1004 BIC: 2.411e+04 Df Model: 16 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1421.752 129.337 0.000 1.81e+05 1.87e+05 1stFlrSF 1.079e+04 1323.764 8.149 0.000 8189.912 1.34e+04 BldgType_Twnhs -1.715e+04 3993.759 -4.295 0.000 -2.5e+04 -9315.794 BsmtFinSF1 1.079e+04 1159.120 9.312 0.000 8519.460 1.31e+04 BsmtQual 7692.7568 1462.100 5.261 0.000 4823.634 1.06e+04 ExterQual 4482.7411 1662.788 2.696 0.007 1219.802 7745.680 Fireplaces 2931.0803 1163.412 2.519 0.012 648.083 5214.077 GarageArea 6744.6653 1305.252 5.167 0.000 4183.330 9306.000 GrLivArea 2.15e+04 2091.712 10.277 0.000 1.74e+04 2.56e+04 KitchenQual 7692.8974 1534.632 5.013 0.000 4681.444 1.07e+04 MSZoning_RM -1.278e+04 3041.027 -4.203 0.000 -1.88e+04 -6815.270 MasVnrArea 5222.6380 1154.436 4.524 0.000 2957.255 7488.021 Neighborhood_NridgHt 3.85e+04 4064.228 9.472 0.000 3.05e+04 4.65e+04 Neighborhood_Sawyer -8317.5382 2562.460 -3.246 0.001 -1.33e+04 -3289.146 OverallCond 5562.6231 1031.317 5.394 0.000 3538.839 7586.407 OverallQual 1.377e+04 1826.438 7.542 0.000 1.02e+04 1.74e+04 TotRmsAbvGrd 2691.9311 1802.156 1.494 0.136 -844.494 6228.356 ============================================================================== Omnibus: 574.225 Durbin-Watson: 1.926 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13648.741 Skew: 2.089 Prob(JB): 0.00 Kurtosis: 20.418 Cond. No. 9.82 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping TotRmsAbvGrd and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 408.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11996. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1005 BIC: 2.410e+04 Df Model: 15 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1422.438 129.299 0.000 1.81e+05 1.87e+05 1stFlrSF 1.074e+04 1324.195 8.111 0.000 8141.768 1.33e+04 BldgType_Twnhs -1.634e+04 3958.597 -4.127 0.000 -2.41e+04 -8568.302 BsmtFinSF1 1.061e+04 1153.541 9.201 0.000 8350.347 1.29e+04 BsmtQual 7590.3193 1461.385 5.194 0.000 4722.603 1.05e+04 ExterQual 4529.7459 1663.509 2.723 0.007 1265.398 7794.094 Fireplaces 2869.0131 1163.381 2.466 0.014 586.078 5151.948 GarageArea 6728.2548 1306.005 5.152 0.000 4165.445 9291.064 GrLivArea 2.386e+04 1372.160 17.386 0.000 2.12e+04 2.65e+04 KitchenQual 7764.5543 1534.821 5.059 0.000 4752.733 1.08e+04 MSZoning_RM -1.327e+04 3025.556 -4.385 0.000 -1.92e+04 -7329.794 MasVnrArea 5247.4047 1155.023 4.543 0.000 2980.871 7513.939 Neighborhood_NridgHt 3.811e+04 4058.436 9.390 0.000 3.01e+04 4.61e+04 Neighborhood_Sawyer -8254.2575 2563.679 -3.220 0.001 -1.33e+04 -3223.480 OverallCond 5545.1589 1031.882 5.374 0.000 3520.268 7570.050 OverallQual 1.372e+04 1827.178 7.508 0.000 1.01e+04 1.73e+04 ============================================================================== Omnibus: 578.918 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13924.298 Skew: 2.109 Prob(JB): 0.00 Kurtosis: 20.593 Cond. No. 9.39 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 383.7 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11994. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1004 BIC: 2.411e+04 Df Model: 16 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1432.315 128.178 0.000 1.81e+05 1.86e+05 1stFlrSF 1.172e+04 1429.863 8.199 0.000 8917.509 1.45e+04 BldgType_Twnhs -1.777e+04 4032.706 -4.407 0.000 -2.57e+04 -9856.894 BsmtFinSF1 9609.7527 1278.842 7.514 0.000 7100.242 1.21e+04 BsmtQual 7111.6102 1483.497 4.794 0.000 4200.500 1e+04 ExterQual 4421.0278 1662.713 2.659 0.008 1158.237 7683.819 Fireplaces 2718.6584 1165.031 2.334 0.020 432.484 5004.833 GarageArea 6501.1295 1310.549 4.961 0.000 3929.400 9072.859 GrLivArea 2.224e+04 1635.183 13.602 0.000 1.9e+04 2.55e+04 KitchenQual 7542.2667 1537.997 4.904 0.000 4524.211 1.06e+04 MSZoning_RM -1.163e+04 3154.868 -3.686 0.000 -1.78e+04 -5437.028 MasVnrArea 5167.7796 1154.556 4.476 0.000 2902.161 7433.399 Neighborhood_NridgHt 3.913e+04 4092.975 9.560 0.000 3.11e+04 4.72e+04 Neighborhood_Sawyer -7945.3967 2566.459 -3.096 0.002 -1.3e+04 -2909.158 OverallCond 5660.8189 1032.694 5.482 0.000 3634.332 7687.306 OverallQual 1.358e+04 1826.780 7.432 0.000 9992.753 1.72e+04 TotalBath 2979.5942 1646.096 1.810 0.071 -250.588 6209.776 ============================================================================== Omnibus: 575.417 Durbin-Watson: 1.936 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13540.596 Skew: 2.098 Prob(JB): 0.00 Kurtosis: 20.340 Cond. No. 10.3 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping TotalBath and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.859 Model: OLS Adj. R-squared: 0.857 Method: Least Squares F-statistic: 408.1 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11996. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1005 BIC: 2.410e+04 Df Model: 15 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.839e+05 1422.438 129.299 0.000 1.81e+05 1.87e+05 1stFlrSF 1.074e+04 1324.195 8.111 0.000 8141.768 1.33e+04 BldgType_Twnhs -1.634e+04 3958.597 -4.127 0.000 -2.41e+04 -8568.302 BsmtFinSF1 1.061e+04 1153.541 9.201 0.000 8350.347 1.29e+04 BsmtQual 7590.3193 1461.385 5.194 0.000 4722.603 1.05e+04 ExterQual 4529.7459 1663.509 2.723 0.007 1265.398 7794.094 Fireplaces 2869.0131 1163.381 2.466 0.014 586.078 5151.948 GarageArea 6728.2548 1306.005 5.152 0.000 4165.445 9291.064 GrLivArea 2.386e+04 1372.160 17.386 0.000 2.12e+04 2.65e+04 KitchenQual 7764.5543 1534.821 5.059 0.000 4752.733 1.08e+04 MSZoning_RM -1.327e+04 3025.556 -4.385 0.000 -1.92e+04 -7329.794 MasVnrArea 5247.4047 1155.023 4.543 0.000 2980.871 7513.939 Neighborhood_NridgHt 3.811e+04 4058.436 9.390 0.000 3.01e+04 4.61e+04 Neighborhood_Sawyer -8254.2575 2563.679 -3.220 0.001 -1.33e+04 -3223.480 OverallCond 5545.1589 1031.882 5.374 0.000 3520.268 7570.050 OverallQual 1.372e+04 1827.178 7.508 0.000 1.01e+04 1.73e+04 ============================================================================== Omnibus: 578.918 Durbin-Watson: 1.925 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13924.298 Skew: 2.109 Prob(JB): 0.00 Kurtosis: 20.593 Cond. No. 9.39 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.860 Model: OLS Adj. R-squared: 0.858 Method: Least Squares F-statistic: 385.2 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11992. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1004 BIC: 2.410e+04 Df Model: 16 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.841e+05 1420.133 129.634 0.000 1.81e+05 1.87e+05 1stFlrSF 7119.1079 1928.432 3.692 0.000 3334.890 1.09e+04 BldgType_Twnhs -1.582e+04 3952.662 -4.002 0.000 -2.36e+04 -8061.543 BsmtFinSF1 1.009e+04 1167.899 8.643 0.000 7802.003 1.24e+04 BsmtQual 5886.5168 1600.310 3.678 0.000 2746.182 9026.852 ExterQual 4572.1359 1658.943 2.756 0.006 1316.743 7827.529 Fireplaces 3062.7983 1162.566 2.635 0.009 781.460 5344.137 GarageArea 6739.2785 1302.364 5.175 0.000 4183.612 9294.945 GrLivArea 2.447e+04 1388.612 17.619 0.000 2.17e+04 2.72e+04 KitchenQual 7754.6047 1530.538 5.067 0.000 4751.184 1.08e+04 MSZoning_RM -1.374e+04 3022.703 -4.546 0.000 -1.97e+04 -7809.232 MasVnrArea 5176.2763 1152.127 4.493 0.000 2915.423 7437.130 Neighborhood_NridgHt 3.795e+04 4047.587 9.375 0.000 3e+04 4.59e+04 Neighborhood_Sawyer -8722.9667 2562.981 -3.403 0.001 -1.38e+04 -3693.554 OverallCond 5515.7566 1029.063 5.360 0.000 3496.396 7535.117 OverallQual 1.345e+04 1825.122 7.368 0.000 9865.607 1.7e+04 TotalBsmtSF 5092.1573 1976.298 2.577 0.010 1214.009 8970.305 ============================================================================== Omnibus: 576.902 Durbin-Watson: 1.931 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13710.878 Skew: 2.103 Prob(JB): 0.00 Kurtosis: 20.453 Cond. No. 9.90 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.876 Model: OLS Adj. R-squared: 0.874 Method: Least Squares F-statistic: 418.5 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11928. No. Observations: 1021 AIC: 2.389e+04 Df Residuals: 1003 BIC: 2.398e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.838e+05 1334.695 137.678 0.000 1.81e+05 1.86e+05 1stFlrSF 6101.1879 1814.106 3.363 0.001 2541.310 9661.066 BldgType_Twnhs -1.685e+04 3715.025 -4.534 0.000 -2.41e+04 -9555.004 BsmtFinSF1 9314.6296 1099.430 8.472 0.000 7157.182 1.15e+04 BsmtQual 5818.2517 1503.681 3.869 0.000 2867.530 8768.973 ExterQual 4700.2431 1558.801 3.015 0.003 1641.358 7759.128 Fireplaces 2512.9366 1093.392 2.298 0.022 367.339 4658.534 GarageArea 6649.8565 1223.740 5.434 0.000 4248.472 9051.241 GrLivArea -3.791e+04 5540.112 -6.842 0.000 -4.88e+04 -2.7e+04 KitchenQual 7473.6554 1438.316 5.196 0.000 4651.202 1.03e+04 MSZoning_RM -1.223e+04 2843.144 -4.303 0.000 -1.78e+04 -6654.673 MasVnrArea 4260.8861 1085.432 3.926 0.000 2130.908 6390.865 Neighborhood_NridgHt 3.592e+04 3807.159 9.436 0.000 2.85e+04 4.34e+04 Neighborhood_Sawyer -7312.3464 2411.283 -3.033 0.002 -1.2e+04 -2580.609 OverallCond 5420.8369 966.954 5.606 0.000 3523.352 7318.322 OverallQual 1.274e+04 1715.990 7.425 0.000 9373.074 1.61e+04 TotalBsmtSF 5259.0805 1857.008 2.832 0.005 1615.014 8903.147 TotalFlrSFAbvGrd 6.544e+04 5649.153 11.584 0.000 5.44e+04 7.65e+04 ============================================================================== Omnibus: 319.291 Durbin-Watson: 1.923 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2726.205 Skew: 1.184 Prob(JB): 0.00 Kurtosis: 10.647 Cond. No. 21.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.860 Model: OLS Adj. R-squared: 0.858 Method: Least Squares F-statistic: 385.2 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11992. No. Observations: 1021 AIC: 2.402e+04 Df Residuals: 1004 BIC: 2.410e+04 Df Model: 16 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.841e+05 1420.133 129.634 0.000 1.81e+05 1.87e+05 1stFlrSF 7119.1079 1928.432 3.692 0.000 3334.890 1.09e+04 BldgType_Twnhs -1.582e+04 3952.662 -4.002 0.000 -2.36e+04 -8061.543 BsmtFinSF1 1.009e+04 1167.899 8.643 0.000 7802.003 1.24e+04 BsmtQual 5886.5168 1600.310 3.678 0.000 2746.182 9026.852 ExterQual 4572.1359 1658.943 2.756 0.006 1316.743 7827.529 Fireplaces 3062.7983 1162.566 2.635 0.009 781.460 5344.137 GarageArea 6739.2785 1302.364 5.175 0.000 4183.612 9294.945 GrLivArea 2.447e+04 1388.612 17.619 0.000 2.17e+04 2.72e+04 KitchenQual 7754.6047 1530.538 5.067 0.000 4751.184 1.08e+04 MSZoning_RM -1.374e+04 3022.703 -4.546 0.000 -1.97e+04 -7809.232 MasVnrArea 5176.2763 1152.127 4.493 0.000 2915.423 7437.130 Neighborhood_NridgHt 3.795e+04 4047.587 9.375 0.000 3e+04 4.59e+04 Neighborhood_Sawyer -8722.9667 2562.981 -3.403 0.001 -1.38e+04 -3693.554 OverallCond 5515.7566 1029.063 5.360 0.000 3496.396 7535.117 OverallQual 1.345e+04 1825.122 7.368 0.000 9865.607 1.7e+04 TotalBsmtSF 5092.1573 1976.298 2.577 0.010 1214.009 8970.305 ============================================================================== Omnibus: 576.902 Durbin-Watson: 1.931 Prob(Omnibus): 0.000 Jarque-Bera (JB): 13710.878 Skew: 2.103 Prob(JB): 0.00 Kurtosis: 20.453 Cond. No. 9.90 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 365.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:15 Log-Likelihood: -11988. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1003 BIC: 2.410e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1424.129 128.951 0.000 1.81e+05 1.86e+05 1stFlrSF 6811.7268 1924.694 3.539 0.000 3034.839 1.06e+04 BldgType_Twnhs -1.744e+04 3980.010 -4.383 0.000 -2.53e+04 -9633.225 BsmtFinSF1 1.003e+04 1164.004 8.618 0.000 7747.122 1.23e+04 BsmtQual 3883.5141 1743.243 2.228 0.026 462.692 7304.336 ExterQual 3842.6332 1672.891 2.297 0.022 559.865 7125.401 Fireplaces 3189.1619 1159.334 2.751 0.006 914.164 5464.160 GarageArea 5976.8080 1325.183 4.510 0.000 3376.360 8577.256 GrLivArea 2.569e+04 1448.901 17.729 0.000 2.28e+04 2.85e+04 KitchenQual 7079.8608 1543.499 4.587 0.000 4051.003 1.01e+04 MSZoning_RM -1.034e+04 3241.011 -3.190 0.001 -1.67e+04 -3977.579 MasVnrArea 4865.2770 1153.275 4.219 0.000 2602.169 7128.385 Neighborhood_NridgHt 3.895e+04 4048.651 9.619 0.000 3.1e+04 4.69e+04 Neighborhood_Sawyer -8873.0140 2554.522 -3.473 0.001 -1.39e+04 -3860.193 OverallCond 6499.6521 1082.215 6.006 0.000 4375.986 8623.318 OverallQual 1.265e+04 1840.137 6.875 0.000 9039.713 1.63e+04 TotalBsmtSF 5844.3070 1987.029 2.941 0.003 1945.097 9743.517 YearBuilt 4872.1360 1712.853 2.844 0.005 1510.950 8233.322 ============================================================================== Omnibus: 600.381 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14855.225 Skew: 2.214 Prob(JB): 0.00 Kurtosis: 21.154 Cond. No. 10.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 344.9 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11988. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1002 BIC: 2.411e+04 Df Model: 18 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1425.428 128.832 0.000 1.81e+05 1.86e+05 1stFlrSF 6803.6492 1931.440 3.523 0.000 3013.518 1.06e+04 BldgType_Twnhs -1.745e+04 3983.702 -4.380 0.000 -2.53e+04 -9632.284 BsmtFinSF1 1.003e+04 1166.340 8.604 0.000 7745.994 1.23e+04 BsmtQual 3869.4578 1763.389 2.194 0.028 409.100 7329.816 ExterQual 3835.4991 1678.919 2.285 0.023 540.899 7130.099 Fireplaces 3195.4764 1165.778 2.741 0.006 907.830 5483.123 GarageArea 5978.2593 1326.114 4.508 0.000 3375.981 8580.538 GrLivArea 2.568e+04 1455.332 17.646 0.000 2.28e+04 2.85e+04 KitchenQual 7056.8286 1601.970 4.405 0.000 3913.228 1.02e+04 MSZoning_RM -1.034e+04 3243.073 -3.188 0.001 -1.67e+04 -3976.445 MasVnrArea 4869.2821 1156.225 4.211 0.000 2600.382 7138.182 Neighborhood_NridgHt 3.895e+04 4053.827 9.609 0.000 3.1e+04 4.69e+04 Neighborhood_Sawyer -8864.6489 2560.474 -3.462 0.001 -1.39e+04 -3840.144 OverallCond 6476.9715 1161.208 5.578 0.000 4198.294 8755.649 OverallQual 1.265e+04 1841.068 6.871 0.000 9037.465 1.63e+04 TotalBsmtSF 5853.9114 1995.941 2.933 0.003 1937.207 9770.615 YearBuilt 4835.9464 1839.837 2.628 0.009 1225.570 8446.323 YearRemodAdd 82.3336 1523.132 0.054 0.957 -2906.560 3071.227 ============================================================================== Omnibus: 600.579 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14873.896 Skew: 2.215 Prob(JB): 0.00 Kurtosis: 21.166 Cond. No. 11.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Dropping YearRemodAdd and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 365.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11988. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1003 BIC: 2.410e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1424.129 128.951 0.000 1.81e+05 1.86e+05 1stFlrSF 6811.7268 1924.694 3.539 0.000 3034.839 1.06e+04 BldgType_Twnhs -1.744e+04 3980.010 -4.383 0.000 -2.53e+04 -9633.225 BsmtFinSF1 1.003e+04 1164.004 8.618 0.000 7747.122 1.23e+04 BsmtQual 3883.5141 1743.243 2.228 0.026 462.692 7304.336 ExterQual 3842.6332 1672.891 2.297 0.022 559.865 7125.401 Fireplaces 3189.1619 1159.334 2.751 0.006 914.164 5464.160 GarageArea 5976.8080 1325.183 4.510 0.000 3376.360 8577.256 GrLivArea 2.569e+04 1448.901 17.729 0.000 2.28e+04 2.85e+04 KitchenQual 7079.8608 1543.499 4.587 0.000 4051.003 1.01e+04 MSZoning_RM -1.034e+04 3241.011 -3.190 0.001 -1.67e+04 -3977.579 MasVnrArea 4865.2770 1153.275 4.219 0.000 2602.169 7128.385 Neighborhood_NridgHt 3.895e+04 4048.651 9.619 0.000 3.1e+04 4.69e+04 Neighborhood_Sawyer -8873.0140 2554.522 -3.473 0.001 -1.39e+04 -3860.193 OverallCond 6499.6521 1082.215 6.006 0.000 4375.986 8623.318 OverallQual 1.265e+04 1840.137 6.875 0.000 9039.713 1.63e+04 TotalBsmtSF 5844.3070 1987.029 2.941 0.003 1945.097 9743.517 YearBuilt 4872.1360 1712.853 2.844 0.005 1510.950 8233.322 ============================================================================== Omnibus: 600.381 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14855.225 Skew: 2.214 Prob(JB): 0.00 Kurtosis: 21.154 Cond. No. 10.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 345.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11987. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1002 BIC: 2.411e+04 Df Model: 18 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 2.22e+05 2.81e+04 7.897 0.000 1.67e+05 2.77e+05 1stFlrSF 6867.7438 1924.304 3.569 0.000 3091.617 1.06e+04 BldgType_Twnhs -1.759e+04 3979.838 -4.421 0.000 -2.54e+04 -9784.632 BsmtFinSF1 1.004e+04 1163.525 8.630 0.000 7757.741 1.23e+04 BsmtQual 3841.6039 1742.764 2.204 0.028 421.718 7261.490 ExterQual 3814.1383 1672.302 2.281 0.023 532.522 7095.754 Fireplaces 3207.9632 1158.917 2.768 0.006 933.780 5482.147 GarageArea 5972.0252 1324.617 4.508 0.000 3372.683 8571.367 GrLivArea 2.561e+04 1449.275 17.674 0.000 2.28e+04 2.85e+04 KitchenQual 7105.3660 1542.948 4.605 0.000 4077.585 1.01e+04 MSZoning_RM -1.036e+04 3239.674 -3.199 0.001 -1.67e+04 -4006.334 MasVnrArea 4870.6232 1152.785 4.225 0.000 2608.473 7132.774 Neighborhood_NridgHt 3.915e+04 4049.555 9.667 0.000 3.12e+04 4.71e+04 Neighborhood_Sawyer -8806.7138 2553.886 -3.448 0.001 -1.38e+04 -3795.136 OverallCond 6567.2728 1082.884 6.065 0.000 4442.293 8692.253 OverallQual 1.263e+04 1839.382 6.869 0.000 9025.504 1.62e+04 TotalBsmtSF 5780.2080 1986.729 2.909 0.004 1881.581 9678.835 YearBuilt 4904.6437 1712.282 2.864 0.004 1544.574 8264.713 dateSold -3.174e-05 2.33e-05 -1.365 0.173 -7.74e-05 1.39e-05 ============================================================================== Omnibus: 598.524 Durbin-Watson: 1.939 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14660.003 Skew: 2.208 Prob(JB): 0.00 Kurtosis: 21.031 Cond. No. 3.54e+10 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 3.54e+10. This might indicate that there are strong multicollinearity or other numerical problems. Dropping dateSold and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 365.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11988. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1003 BIC: 2.410e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1424.129 128.951 0.000 1.81e+05 1.86e+05 1stFlrSF 6811.7268 1924.694 3.539 0.000 3034.839 1.06e+04 BldgType_Twnhs -1.744e+04 3980.010 -4.383 0.000 -2.53e+04 -9633.225 BsmtFinSF1 1.003e+04 1164.004 8.618 0.000 7747.122 1.23e+04 BsmtQual 3883.5141 1743.243 2.228 0.026 462.692 7304.336 ExterQual 3842.6332 1672.891 2.297 0.022 559.865 7125.401 Fireplaces 3189.1619 1159.334 2.751 0.006 914.164 5464.160 GarageArea 5976.8080 1325.183 4.510 0.000 3376.360 8577.256 GrLivArea 2.569e+04 1448.901 17.729 0.000 2.28e+04 2.85e+04 KitchenQual 7079.8608 1543.499 4.587 0.000 4051.003 1.01e+04 MSZoning_RM -1.034e+04 3241.011 -3.190 0.001 -1.67e+04 -3977.579 MasVnrArea 4865.2770 1153.275 4.219 0.000 2602.169 7128.385 Neighborhood_NridgHt 3.895e+04 4048.651 9.619 0.000 3.1e+04 4.69e+04 Neighborhood_Sawyer -8873.0140 2554.522 -3.473 0.001 -1.39e+04 -3860.193 OverallCond 6499.6521 1082.215 6.006 0.000 4375.986 8623.318 OverallQual 1.265e+04 1840.137 6.875 0.000 9039.713 1.63e+04 TotalBsmtSF 5844.3070 1987.029 2.941 0.003 1945.097 9743.517 YearBuilt 4872.1360 1712.853 2.844 0.005 1510.950 8233.322 ============================================================================== Omnibus: 600.381 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14855.225 Skew: 2.214 Prob(JB): 0.00 Kurtosis: 21.154 Cond. No. 10.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 345.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11987. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1002 BIC: 2.411e+04 Df Model: 18 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 2.22e+05 2.81e+04 7.897 0.000 1.67e+05 2.77e+05 1stFlrSF 6867.7438 1924.304 3.569 0.000 3091.617 1.06e+04 BldgType_Twnhs -1.759e+04 3979.838 -4.421 0.000 -2.54e+04 -9784.632 BsmtFinSF1 1.004e+04 1163.525 8.630 0.000 7757.741 1.23e+04 BsmtQual 3841.6039 1742.764 2.204 0.028 421.718 7261.490 ExterQual 3814.1383 1672.302 2.281 0.023 532.522 7095.754 Fireplaces 3207.9632 1158.917 2.768 0.006 933.780 5482.147 GarageArea 5972.0252 1324.617 4.508 0.000 3372.683 8571.367 GrLivArea 2.561e+04 1449.275 17.674 0.000 2.28e+04 2.85e+04 KitchenQual 7105.3660 1542.948 4.605 0.000 4077.585 1.01e+04 MSZoning_RM -1.036e+04 3239.674 -3.199 0.001 -1.67e+04 -4006.334 MasVnrArea 4870.6232 1152.785 4.225 0.000 2608.473 7132.774 Neighborhood_NridgHt 3.915e+04 4049.555 9.667 0.000 3.12e+04 4.71e+04 Neighborhood_Sawyer -8806.7138 2553.886 -3.448 0.001 -1.38e+04 -3795.136 OverallCond 6567.2728 1082.884 6.065 0.000 4442.293 8692.253 OverallQual 1.263e+04 1839.382 6.869 0.000 9025.504 1.62e+04 TotalBsmtSF 5780.2080 1986.729 2.909 0.004 1881.581 9678.835 YearBuilt 4904.6437 1712.282 2.864 0.004 1544.574 8264.713 dateSold -3.174e-05 2.33e-05 -1.365 0.173 -7.74e-05 1.39e-05 ============================================================================== Omnibus: 598.524 Durbin-Watson: 1.939 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14660.003 Skew: 2.208 Prob(JB): 0.00 Kurtosis: 21.031 Cond. No. 3.54e+10 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 3.54e+10. This might indicate that there are strong multicollinearity or other numerical problems. Dropping dateSold and rebuilding the model as it did not add any info to model OLS Regression Results ============================================================================== Dep. Variable: SalePrice R-squared: 0.861 Model: OLS Adj. R-squared: 0.859 Method: Least Squares F-statistic: 365.6 Date: Mon, 27 Jun 2022 Prob (F-statistic): 0.00 Time: 19:38:16 Log-Likelihood: -11988. No. Observations: 1021 AIC: 2.401e+04 Df Residuals: 1003 BIC: 2.410e+04 Df Model: 17 Covariance Type: nonrobust ======================================================================================== coef std err t P>|t| [0.025 0.975] ---------------------------------------------------------------------------------------- const 1.836e+05 1424.129 128.951 0.000 1.81e+05 1.86e+05 1stFlrSF 6811.7268 1924.694 3.539 0.000 3034.839 1.06e+04 BldgType_Twnhs -1.744e+04 3980.010 -4.383 0.000 -2.53e+04 -9633.225 BsmtFinSF1 1.003e+04 1164.004 8.618 0.000 7747.122 1.23e+04 BsmtQual 3883.5141 1743.243 2.228 0.026 462.692 7304.336 ExterQual 3842.6332 1672.891 2.297 0.022 559.865 7125.401 Fireplaces 3189.1619 1159.334 2.751 0.006 914.164 5464.160 GarageArea 5976.8080 1325.183 4.510 0.000 3376.360 8577.256 GrLivArea 2.569e+04 1448.901 17.729 0.000 2.28e+04 2.85e+04 KitchenQual 7079.8608 1543.499 4.587 0.000 4051.003 1.01e+04 MSZoning_RM -1.034e+04 3241.011 -3.190 0.001 -1.67e+04 -3977.579 MasVnrArea 4865.2770 1153.275 4.219 0.000 2602.169 7128.385 Neighborhood_NridgHt 3.895e+04 4048.651 9.619 0.000 3.1e+04 4.69e+04 Neighborhood_Sawyer -8873.0140 2554.522 -3.473 0.001 -1.39e+04 -3860.193 OverallCond 6499.6521 1082.215 6.006 0.000 4375.986 8623.318 OverallQual 1.265e+04 1840.137 6.875 0.000 9039.713 1.63e+04 TotalBsmtSF 5844.3070 1987.029 2.941 0.003 1945.097 9743.517 YearBuilt 4872.1360 1712.853 2.844 0.005 1510.950 8233.322 ============================================================================== Omnibus: 600.381 Durbin-Watson: 1.935 Prob(Omnibus): 0.000 Jarque-Bera (JB): 14855.225 Skew: 2.214 Prob(JB): 0.00 Kurtosis: 21.154 Cond. No. 10.6 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
从上面得出的结果可以看到,经过我们一系列的特征选择,我们的测试集额R2值要高于训练集,说明我们成功的解决了overfitting问题
结论
- 本文主要介绍了如何使用数值分析工具进行特征的分析和特征的处理,并且demo和线性回归。
- 基于本文,还可以进行进一步的特征工程以提高模型的鲁棒性,限于篇幅,这里不再展开