【DSW Gallery】数据分析经典案例:Kaggle竞赛之房价预测

本文涉及的产品
模型训练 PAI-DLC,5000CU*H 3个月
模型在线服务 PAI-EAS,A10/V100等 500元 1个月
交互式建模 PAI-DSW,每月250计算时 3个月
简介: Python是目前当之无愧的数据分析第一语言,大量的数据科学家使用Python来完成各种各样的数据科学任务。本文以Kaggle竞赛中的房价预测为例,结合JupyterLab Notebook,完成数据加载、数据探索、数据可视化、数据清洗、特征分析、特征处理、机器学习、回归预测等步骤,主要Python工具是Pandas和SKLearn。本文中仅仅使用了线性回归这一最基本的机器学习模型,读者可以自行尝试其他更加复杂模型,比如随机森林、支持向量机、XGBoost等。

直接使用

请打开数据分析经典案例:Kaggle竞赛之房价预测,并点击右上角 “ 在DSW中打开” 。

image.png


使用机器学习回归模型预测房价

本文展示如何利用一个包含数值类型,和非数值类型的数据集来做数据加载、数据探索、数据处理、特征工程,以及最终实果比较好的回归模型。其中涉及到的数据和案例是这个Kaggle竞赛

同时,DSW中还有另外一个Sample Notebook也使用了这个数据集进行房价的回归分析,有兴趣的同学可以看一下这两种不同的算法在同一个数据集上面的表现。 链接

最后的结果是XGBoost算法实现了更好的精度(92% vs 86%)

准备工作

本文依赖的软件包都已经在DSW镜像中预置安装,如果您的环境没有安装的话,可以用pip install xxx来完成准备。

我们先把需要的python库import进来。

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import math
import datetime
from scipy import stats
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler,OrdinalEncoder,LabelEncoder
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.metrics import r2_score,mean_squared_error
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
sns.set_style('darkgrid')
# 禁掉Warning的输出
import warnings
warnings.filterwarnings('ignore')

数据加载

使用Pandas读入数据,并查看原始数据。train.csv文件是我们已经提前从网上下载并准备好。本文没有涉及测试样本,可以在网上下载对应的test.csv文件。

df = pd.read_csv('train.csv')
df.head()
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
0 1 60 RL 65.0 8450 Pave NaN Reg Lvl AllPub ... 0 NaN NaN NaN 0 2 2008 WD Normal 208500
1 2 20 RL 80.0 9600 Pave NaN Reg Lvl AllPub ... 0 NaN NaN NaN 0 5 2007 WD Normal 181500
2 3 60 RL 68.0 11250 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 9 2008 WD Normal 223500
3 4 70 RL 60.0 9550 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 2 2006 WD Abnorml 140000
4 5 60 RL 84.0 14260 Pave NaN IR1 Lvl AllPub ... 0 NaN NaN NaN 0 12 2008 WD Normal 250000

5 rows × 81 columns

df.shape
(1460, 81)
df.info()

image.png

Click to hide<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     1452 non-null   object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
 80  SalePrice      1460 non-null   int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB

数据清洗与预处理

一般我们拿到的原始数据都有各种各样的问题,不利于分析和训练,所以要经过一个清洗和预处理的阶段,比如去重,缺失值,异常值等等的处理。

我们在前面已经看到原始数据有81列特征,总计1460条记录。其中ID列对我们做训练没有意义,先去掉:

df.drop('Id',axis=1,inplace=True)

处理空值(缺失值)

csv文件中有些列没有值,我们需要统一处理。

# 定义一个函数,求每一个dataframe中的空值的比例
def check_null_percentage(df):
    missing_info = pd.DataFrame(np.array(df.isnull().sum().sort_values(ascending=False).reset_index()),
                                columns=['Columns','Missing_Percentage']).query("Missing_Percentage > 0").set_index('Columns')
    return 100*missing_info/df.shape[0]
check_null_percentage(df)

Missing_Percentage

Columns
PoolQC 99.5205
MiscFeature 96.3014
Alley 93.7671
Fence 80.7534
FireplaceQu 47.2603
LotFrontage 17.7397
GarageType 5.54795
GarageCond 5.54795
GarageFinish 5.54795
GarageQual 5.54795
GarageYrBlt 5.54795
BsmtFinType2 2.60274
BsmtExposure 2.60274
BsmtQual 2.53425
BsmtCond 2.53425
BsmtFinType1 2.53425
MasVnrArea 0.547945
MasVnrType 0.547945
Electrical 0.068493

从上面的结果可以得到有缺失值存在的列的列名,我们把其中缺失值比较严重的非数值型的列选出来,并且填充为'NA',使得后续的一些计算和处理不会报错。

从具体含义上面来讲,这里其实表示某些房子没有某一些属性,比如没有车库等等。

下面的NA_columns里边的列都是非数值类型的feature,其他的数值类型的feature后面会专门处理

NA_columns = ['Alley','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','FireplaceQu',\
          'GarageType','GarageFinish','GarageQual','GarageCond','PoolQC','Fence','MiscFeature']
df[NA_columns] = df[NA_columns].fillna('NA')

确保没有任何一条记录超过5个缺失值:

df[df.isnull().sum(axis=1) > 5]
MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig ... PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice

0 rows × 80 columns

检查重复样本

经过上面的处理,检查一下现在数据集中是不是存在完全相同的行

df.duplicated(keep=False).sum()
0

这里检查一下每一个feature中,值的分布情况,按照”纯度“来排名,目的是列出那些只有某一个值的feature

# 检查每一个feature的值的唯一性
def top_unique_count(x):
    unq_cnt = ( x.value_counts(ascending=False,dropna=False).head(1).index.values[0],
               100 * x.value_counts(ascending=False,dropna=False).head(1).values[0]/df.shape[0],
               x.value_counts(ascending=False,dropna=False).head(1).values[0])
    return unq_cnt

处理数值型特征的缺失值

前面我们把用check_null_percentage这个函数找出来的包含缺失值的非数值类型的列都用‘NA’填充,这里列出数值类型的feature并且包含缺失值的feature。

然后使用”均值“填充缺失值的位置

check_null_percentage(df)

image.png

LotFrontage这一列有缺失值,需要处理一下

df['LotFrontage'] = df.groupby(['Neighborhood','LotConfig'])['LotFrontage'].\
                        apply(lambda x: np.Nan if x.median() == np.NaN else x.fillna(x.median()))
df['LotFrontage'].isnull().sum()
5

这里使用均值来填

df['LotFrontage'] = df.groupby(['LotConfig'])['LotFrontage'].apply(lambda x: x.fillna(x.median()))

下面看一下车库的年限的缺失值和其他一些和车库念想有关系的指标的缺失值情况

可以发现,车库年限的缺失值位置,其他的这些车库的相关的feature也都是缺失的。这个也很好理解,对于没有车库的房子,所有和车库相关的属性确实都不存在

df.loc[df.GarageYrBlt.isnull(),['GarageType','GarageCars','GarageArea','GarageFinish','GarageYrBlt','GarageQual','GarageCond']]

image.png

检查一下房子年限的空值有多少

df.YearBuilt.isnull().sum()
0

这里使用房子的年限值来填充缺失的车库年限的值

df.loc[df.GarageYrBlt.isnull(),'GarageYrBlt'] = df.loc[df.GarageYrBlt.isnull(),'YearBuilt']

下面两个值,一个是数值类型,一个是类别类型。分别用NA和0填充

df.MasVnrArea.fillna(0,inplace=True)
df.MasVnrType.fillna('Not present',inplace=True)

特征工程

创建新的feature

机器学习特征工程离不开相关的Domain Knowledge。对于本文而言,房子的数据集中没有包含房子的总层数,房子总共有多少个洗手间和房子门厅的大小。

但是根据相关的Domain Knowledge,这些属性对于买家来说都是很看重的因素,所以这里基于现有的其他feature,将这几个属性创建出来

# 房子总面积
df['TotalFlrSFAbvGrd'] = df[['1stFlrSF','2ndFlrSF']].sum(axis=1)
# 一共有多少个洗漱间
df['TotalBath'] = df[['BsmtFullBath','BsmtHalfBath','FullBath','HalfBath']].sum(axis=1)
# 门廊的面积
df['TotalPorchSF'] = df[['OpenPorchSF','EnclosedPorch','3SsnPorch','ScreenPorch','WoodDeckSF']].sum(axis=1)

由于新增加了feature,再一次检查当前数据集中是否有空值的feature

# 再次检查新增加的特征有没有空值
check_null_percentage(df)

image.png

调整特征类别

将那些超过9成值都一样的featuredrop掉,因为这些feature的熵很低,对于我们的模型来说,包含的有用信息极少

unique_df = df.apply(top_unique_count).rename(index={0:"Value",1:'Percentage',2:'Count'})\
    .T.sort_values(by='Count',ascending=False)
unique_df.head(25)

image.png

# 看看哪些feature的值基本都一样
drop_columns = unique_df.query('Percentage > 90.0').index.values
drop_columns
array(['Utilities', 'Street', 'PoolArea', 'PoolQC', 'Condition2',
       '3SsnPorch', 'LowQualFinSF', 'RoofMatl', 'Heating', 'MiscVal',
       'MiscFeature', 'KitchenAbvGr', 'LandSlope', 'BsmtHalfBath',
       'Alley', 'CentralAir', 'Functional', 'ScreenPorch', 'PavedDrive',
       'Electrical', 'GarageCond'], dtype=object)
# 将上面拿到的这些feature都去掉,因为他们不包含很多有用的信息
df.drop(columns=drop_columns,inplace=True)
del drop_columns

按照当前的数据集中的每一个feature的数据类型,将他们分为下面几类:

  1. 数值型
  2. 类别型
  3. 时间序列型
# 列出数值型feature
numerical_features = list(df.select_dtypes(include=[np.number]).columns.values)
# 列出类别型的feature
categorical_features = list(df.select_dtypes(include=[np.object]).columns.values)
# 时间序列的feature
timeseries_features = ['YearBuilt', 'YearRemodAdd', 'YrSold', 'MoSold', 'GarageYrBlt']
# 将时间序列的feature从数值型的feature中去掉,分别处理
for col in timeseries_features:
    numerical_features.remove(col) 
# 如果数值型的feature,他的取值少于10个的话,那么将他们归类到类别型的feature中
cat_feature = pd.Series(df[numerical_features].nunique().sort_values(),name='Count').to_frame().query('Count <= 10').index.values
categorical_features.extend(cat_feature)
# 将上面归类到类别型的feature从原来的数值型feature列表中去掉
for col in cat_feature:
    numerical_features.remove(col)

分析数值型的feature

我们看一下数值型特征与房价的关系,特别是线性关系。

fig,ax = plt.subplots(math.ceil(len(numerical_features)/3),3,figsize=(15,30),sharey=True)
i ,j = 0, 0
for col in sorted(numerical_features):
    sns.regplot(col,'SalePrice',data=df,ax=ax[i][j])
    if j == 2:
        j=0
        i +=1
    else:
        j +=1
ax[6][1].set_visible(False)
ax[6][2].set_visible(False)

3-1.png

数值型feature的分析结论

1stFlrSF : 有一个异常值,这个房子平方英尺大于4000
2ndFlrSF : 可以看出这一列数据和售价没有明显的线性关系
BsmtFinSF1 : 有一个异常数据,面积大于5000,价格反而低,可以认为是脏数据
BsmtFinSF2 : 可以看出这一列数据和售价没有明显的线性关系
BsmtUnfSF : 可以看出这一列数据和售价没有明显的线性关系
EnclosedPorch : 有一个异常数据
GarageArea : 有一些异常数据
GrLivArea : 由一些面积大,但是价格低的房子,可以视为异常数据
LotArea : 大于100000之后,有几个异常数据
LotFrontage : 有几条异常数据
MasVnrArea : 大于1500平方尺之后,有异常数据
MSSubClass : 可以看出这一列数据和售价没有明显的线性关系
OpenPorchSF : 大于400之后,有异常数据
TotalBsmtSF : 有一条异常数据

异常值处理

根据上面的结果,分别给每一个数值类型的feature定义了目前数据集中包含的异常值的数量

feature_outlier_count = {'1stFlrSF':1,
                'BsmtFinSF1':1,
                'BsmtFinSF2':1,
                'EnclosedPorch':2,
                'GarageArea':4,
                'GrLivArea':4,
                'LotArea':4,
                'LotFrontage':2,
                'MasVnrArea':1,
                'OpenPorchSF':3,
                'TotalBsmtSF':4,
                'TotRmsAbvGrd':1,
                'TotalFlrSFAbvGrd':2,
                'TotalPorchSF':1,
                'WoodDeckSF':3}

下面定义函数来打印这些异常点

def print_outliers(feature_list):
    for k,v in feature_list.items():
        if v:
            display(df.loc[df[k].isin(sorted(df[k])[-v:]),[k,'SalePrice']])
def get_outliers(feature,index=-1):
    return df.loc[df[feature] == sorted(df[feature])[index],[feature,'SalePrice']].sort_values(by=feature,ascending=False)
print_outliers(feature_outlier_count)

image.png

image.png

image.png

上面可以看出,1298这一行多次出现,说明这是一个包含多条异常值的数据,我们需要将其取掉

outlier_index = get_outliers('1stFlrSF').index.values[0]
outlier_index
1298
df.iloc[1298]
MSSubClass               60
MSZoning                 RL
LotFrontage             313
LotArea               63887
LotShape                IR3
                     ...   
SaleCondition       Partial
SalePrice            160000
TotalFlrSFAbvGrd       5642
TotalBath                 5
TotalPorchSF            506
Name: 1298, Length: 62, dtype: object

按照上面拿到的异常值的index,将其去掉

def remove_outlier_features_count_for_index(outlier_idx):
    for col in feature_outlier_count.keys():
        if (feature_outlier_count[col] > 0) & (outlier_index in get_outliers(col).index.values):
            feature_outlier_count[col] = feature_outlier_count[col]-1 
    df.drop(outlier_index,inplace=True)
    df.reset_index(drop=True,inplace=True)
remove_outlier_features_count_for_index(outlier_index)

使用均值来替代原来的异常值

df.loc[df.index[get_outliers('TotRmsAbvGrd').index.values[0]],'TotRmsAbvGrd'] = df.loc[df['SalePrice'] == get_outliers('TotRmsAbvGrd').SalePrice.values[0],'TotRmsAbvGrd'].mode()[0]
feature_outlier_count['TotRmsAbvGrd'] = 0
def fix_outliers(outlier_features_list):
    for k,v in outlier_features_list.items():
        while v > 0:
            # replacing the outliers by taking mean of four closest feature value of the outlier at the salePrice Range
            replace_with = df.loc[(df['SalePrice']-get_outliers(k)['SalePrice'].values[0]).abs().argsort()[v:v+4],k].mean()
            if (df[k].dtypes == np.int64) | (df[k].dtypes == np.int32):
                df.loc[df.index[get_outliers(k).index.values[0]],k] = int(replace_with)
            else:
                df.loc[df.index[get_outliers(k).index.values[0]],k] = round(replace_with,1)        
            v = v-1
            feature_outlier_count[k] = v
fix_outliers(feature_outlier_count)

下面继续检查数据集中的非正常值

df[['1stFlrSF','SalePrice']].sort_values(by='1stFlrSF',ascending=False)[:3]

image.png

df.loc[df.index[get_outliers('1stFlrSF',-2).index.values[0]],'1stFlrSF'] = df.loc[(df['SalePrice']-get_outliers('1stFlrSF',-2)['SalePrice'].values[0]).abs().argsort()[1:1+4],'1stFlrSF'].mean()
df[['BsmtFinSF1','SalePrice']].sort_values(by='BsmtFinSF1',ascending=False)[:3]

image.png

df[['LotArea','SalePrice']].sort_values(by='LotArea',ascending=False)[:7]

image.png

修正明显不正常的值

feature_outlier_count['LotArea']=3
feature_outlier_count['BsmtFinSF1']=1
fix_outliers(feature_outlier_count)
del feature_outlier_count

进一步分析特征

再一次可视化这些数值类型的feature,看看是不是取得了一定的效果

fig,ax = plt.subplots(math.ceil(len(numerical_features)/3),3,figsize=(15,30),sharey=True)
i ,j = 0, 0
for col in sorted(numerical_features):
    sns.regplot(col,'SalePrice',data=df,ax=ax[i][j])
    if j == 2:
        j=0
        i +=1
    else:
        j +=1
ax[6][1].set_visible(False)
ax[6][2].set_visible(False)

3-2.png

结论:

经过前面的一系列处理,我们发现下列4个feature还需要进一步的处理

BsmtFinSF2 : 这个feature和售价几乎没有线性关联关系,可以去掉
BsmtUnfSF : 这个feature和售价几乎没有线性关联关系,可以去掉
EnclosedPorch : 这个feature和售价几乎没有线性关联关系,可以去掉
MSSubClass : 这个看起来更像一个类别型的feature,应该把它重新放回到类别型的feature中去处理

去掉与房价无关的特征:

df.drop(['BsmtFinSF2','BsmtUnfSF','EnclosedPorch'],axis=1,inplace=True)
for col in ['BsmtFinSF2','BsmtUnfSF','EnclosedPorch']:
    numerical_features.remove(col)

将MSSubClass转化为类别型feature:

df.MSSubClass = df.MSSubClass.astype(str)
df.MSSubClass.replace({'20':'1story', '30':'1story', '40':'1story', '45':'1story', '50':'1story', 
                           '60':'2story', '70':'2story', '75':'2story', '80':'nstory',
                           '85':'nstory', '90':'nstory', '120':'1story', '150':'1story',
                           '160':'2story','180':'nstory','190':'nstory'}, inplace=True)
categorical_features.append('MSSubClass')
# removing it from numerical feature list
numerical_features.remove('MSSubClass')

处理类别型的特征

这里使用甘特图来展示类别型的feature

fig,ax = plt.subplots(math.ceil(len(categorical_features)/3),3,figsize=(20,60),sharey=True)
i ,j = 0, 0
PROPS = {
    'boxprops':{'facecolor':'none', 'edgecolor':'black','linewidth':0.3},
} 
for col in sorted(categorical_features):
    sns.boxplot(col,'SalePrice',data=df,ax=ax[i][j],showfliers=False,**PROPS)
    sns.stripplot(col,'SalePrice',data=df,ax=ax[i][j],alpha=0.5)
    if df[col].nunique() > 8:
        ax[i][j].tick_params(axis='x',rotation=45)
    if j == 2:
        j=0
        i +=1
    else:
        j +=1

3-3.png

结论

合并类别:

BedroomAbvGr : 0,5,6,8都是少数类,需要合并
BldgType : 合并 2fmCon ,Twnhs, Duplex  
BsmtCond :合并 NA, Fa,Poor
BsmtExposure :合并 Mn,Av
BsmtFinType1 : 合并 ALQ, Rec, BLQ,LwQ
BsmtFinType2 : 合并 BLQ , Rec,LwQ
BsmtFullBath : 合并 2,3
BsmtQual : 合并 NA,Fa
Condition1 : 合并 RRNn,RRAn, PosN 和 PosA , RRNe 和 RRAe and Feedr 和 Artery
Exterior2nd : 合并 MetalSd, Wd Shng, HbBoard, Plywood, Wd Sdng , Stucco 和 combine CBlock, Other , Stone, AsphShn, ImStucc, Brk Cmn, BrkFace
FireplaceQu : 合并 No Fireplace, Po 和 Fa
Foundation: 合并 Wood, Slab 和 Stone
FullBath : 合并 0 and 1
GarageType: 合并 Detchd, CarPort, No Garage, Basment 和 2Types.
GarageQual : 合并 Ex 和 Gd , Po , Fa 和 No Garage
HeatingQC : 合并 Fa 和 Po
House Style : 合并 2Story 和 2.5Fin, SFoyer 和 1.5Fin, SLvl 和 1Story, 1.5Unf 和 2.5Unf
LotShape: 合并 IR2 和 IR3
MSZoning : 合并 RM 和 RH to other
MasVnrType: 合并 None, Not present 和 BrkCmn
Neighborhood : 合并 MeadowV , BrDale and IDOTRR , Sawyer , NAmes , NPkVill , Mitchel , SWISU and Blueste , Gilbert , Blmngtn , SawyerW and NWAmes, ClearCr , CollgCr and Crawfor, Veenker, Timber and Somerst , OldTown , Edwards and BrkSide , StoneBr , NridgHt and NoRidge.
OverallCond : 1, 2 和 3 , 6, 7, 和 8
OverallQual : 1 和 2
SaleCondition: 合并 AdjLand, Alloca, Family 和 Abnorml
SaleType: 合并 COD, ConLD, ConLI, CwD, ConLw, Con 和 Oth.

要取掉的feature:

ExterCond : 值太少,属于缺失值太多的feature,删除
Exterior1st : 看起来和售价没有线性相关性
Fence : 这个feature几乎所有的值都一样,删掉
LotConfig : 这个feature几乎所有的值都一样,删掉
RoofStyle : 只有两类,并且数量一样,属于不包含很多有效信息的feature,删掉

高关联性feature:

Fireplaces, GarageCars, HeatingQC, KitchenQual

基于上面的分析,合并类别标签

df.BldgType.replace({'2fmCon':'Twnhs','Duplex':'Twnhs'},inplace=True)
df.BsmtExposure.replace({'Mn':'Av'},inplace=True)
df.Condition1.replace({'RRNn' : 'RRAn', 'PosN' : 'PosA' , 'RRNe' : 'RRAe' , 'Feedr' : 'Artery'},inplace=True)
df.Exterior2nd.replace({'MetalSd':'Wd Sdng', 'Wd Shng':'Wd Sdng', 'HbBoard':'Wd Sdng','Plywood':'Wd Sdng',\
                        'Stucco':'Wd Sdng' , 'CBlock': 'BrkFace','Other': 'BrkFace' , 'Stone': 'BrkFace',\
                        'AsphShn': 'BrkFace', 'ImStucc': 'BrkFace', 'Brk Cmn': 'BrkFace'},inplace=True)
df.Foundation.replace({'Wood':'Stone','Slab':'Stone'},inplace=True)
df.GarageType.replace({'CarPort':'Detchd', 'No Garage':'Detchd', 'Basment':'Detchd' , '2Types':'Detchd'},inplace=True)
df.LotShape.replace({'IR3':'IR2'},inplace=True)
df.MSZoning.replace({'RH':'RM'},inplace=True)
df.MasVnrType.replace({'None':'BrkCmn', 'Not present':'BrkCmn'},inplace=True)
df.Neighborhood.replace({'BrDale':'MeadowV' , 'IDOTRR':'MeadowV' ,\
                         'NAmes':'Sawyer' , 'NPkVill':'Sawyer' , 'Mitchel':'Sawyer' , 'SWISU':'Sawyer', 'Blueste':'Sawyer' ,\
                         'Blmngtn':'Gilbert' , 'SawyerW':'Gilbert', 'NWAmes':'Gilbert',\
                         'ClearCr':'Crawfor' , 'CollgCr' :'Crawfor',\
                         'Timber':'Veenker', 'Somerst':'Veenker' ,\
                         'Edwards':'OldTown', 'BrkSide':'OldTown' ,\
                         'StoneBr' : 'NridgHt' , 'NoRidge': 'NridgHt'},inplace=True)
df.SaleCondition.replace({'AdjLand':'Abnorml', 'Alloca':'Abnorml', 'Family' :'Abnorml'},inplace=True)
df.SaleType.replace({'ConLD':'COD', 'ConLI':'COD', 'CwD':'COD', 'ConLw':'COD', 'Con':'COD', 'Oth':'COD'},inplace=True)

删掉不需要的feature

drop_columns = ['ExterCond', 'Fence', 'LotConfig' ,'RoofStyle' ,'Exterior1st']
df.drop(columns=drop_columns,inplace=True)
for cat in drop_columns[:]:
    categorical_features.remove(cat)

处理时间序列特征

timeseries_features
['YearBuilt', 'YearRemodAdd', 'YrSold', 'MoSold', 'GarageYrBlt']
df[timeseries_features].info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   YearBuilt     1459 non-null   int64  
 1   YearRemodAdd  1459 non-null   int64  
 2   YrSold        1459 non-null   int64  
 3   MoSold        1459 non-null   int64  
 4   GarageYrBlt   1459 non-null   float64
dtypes: float64(1), int64(4)
memory usage: 57.1 KB

将时间序列类别的数据转换成int类型

df.YrSold = df.YrSold.astype(int)
df.GarageYrBlt = df.GarageYrBlt.astype(int)
df['dateSold'] = df['MoSold'].astype(str)+'-1-'+df['YrSold'].astype(str)
df['dateSold'] =pd.to_datetime(df['dateSold'])
timeseries_features.append('dateSold')
df['dateSold'].head()
0   2008-02-01
1   2007-05-01
2   2008-09-01
3   2006-02-01
4   2008-12-01
Name: dateSold, dtype: datetime64[ns]
df.loc[df.GarageYrBlt < 1900,['GarageYrBlt','YearBuilt']]

image.png

可视化数据

fig,ax = plt.subplots(math.ceil(len(timeseries_features)/2),2,figsize=(15,15),sharey=True)
i ,j = 0, 0
for col in sorted(timeseries_features):
    if col == 'GarageYrBlt':
        sns.lineplot(df.loc[df[col] >= 1880,col],df.loc[df[col] != 0,'SalePrice'],ax=ax[i][j])
    else:
        sns.lineplot(col,'SalePrice',data=df,ax=ax[i][j])
    if df[col].nunique() > 8:
        ax[i][j].tick_params(axis='x',rotation=45)
    if col == "YrSold":
        ax[i][j].xaxis.set_ticks([2006,2007,2008,2009,2010])
    if j == 1:
        j=0
        i +=1
    else:
        j +=1

3-4.png

df_dummy = df.pop('SalePrice')
df.insert(df.shape[1],'SalePrice',df_dummy)
del df_dummy

使用下面的heatmap来分析几个时间序列的feature之间的线性相关程度

plt.figure(figsize=(15,12))
sns.heatmap(df.corr(),annot=True);

3-5.png

可以看到MoSold和YrSold这两列和其他任何列几乎都没有什么强相关性,所以可以删除

df.drop(['MoSold','YrSold'],axis=1,inplace=True)
for col in ['MoSold','YrSold']:
    timeseries_features.remove(col)

训练

给类别型的feature进行encoding,为后面的训练作准备

df[['HalfBath','Fireplaces','FullBath','BsmtFullBath','GarageCars','BedroomAbvGr','OverallCond','OverallQual']] = df[['HalfBath','Fireplaces','FullBath','BsmtFullBath','GarageCars','BedroomAbvGr','OverallCond','OverallQual']].astype(int)
categorical_columns =['ExterQual','BsmtQual','BsmtCond','HeatingQC','KitchenQual','FireplaceQu','GarageQual','HouseStyle','BsmtFinType2','BsmtFinType1','GarageFinish']

image.png

df['HouseStyle']=pd.Categorical(df['HouseStyle'],ordered=True,categories=[ 'SFoyer','1.5Unf','1Story','1.5Fin','SLvl','2.5Unf','2Story','2.5Fin'])

解释: 上面的操作是将类别型数据的值转化为categorical类型的数据

image.pngimage.png

image.png

对类别型的feature进行one-hot编码

image.png


MSSubClass MSZoning LotFrontage LotArea LotShape LandContour Neighborhood Condition1 BldgType HouseStyle ... BsmtExposure_NA BsmtExposure_No GarageType_BuiltIn GarageType_Detchd GarageType_NA SaleType_CWD SaleType_New SaleType_WD SaleCondition_Normal SaleCondition_Partial
0 2story RL 65 8450 Reg Lvl Crawfor Norm 1Fam 6 ... 0 1 0 0 0 0 0 1 1 0
1 1story RL 80 9600 Reg Lvl Veenker Artery 1Fam 2 ... 0 0 0 0 0 0 0 1 1 0
2 2story RL 68 11250 IR1 Lvl Crawfor Norm 1Fam 6 ... 0 0 0 0 0 0 0 1 1 0
3 2story RL 60 9550 IR1 Lvl Crawfor Norm 1Fam 6 ... 0 1 0 1 0 0 0 1 0 0
4 2story RL 84 14260 IR1 Lvl NridgHt Norm 1Fam 6 ... 0 0 0 0 0 0 0 1 1 0

1459 rows × 82 columns

house_price.reset_index(drop=True,inplace=True)

处理日期列

总的方法就是将其转化为整型数值

image.png

数据切分

image.png

这里使用sklearn中的train_test_split来切分数据集为训练集和测试集

这个方法可以保证数据集的切分不破坏原有数据集中各自feature的概率分布情况,保证切分后的训练集和测试集各自的包含的相关变量的概率分布一致

image.png

归一化

image.png


LotFrontage LotArea HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd MasVnrArea ExterQual BsmtQual ... BsmtExposure_NA BsmtExposure_No GarageType_BuiltIn GarageType_Detchd GarageType_NA SaleType_CWD SaleType_New SaleType_WD SaleCondition_Normal SaleCondition_Partial
1453 -0.39373 -0.52222 -0.74866 0.661837 -0.48597 1.084801 0.986677 -0.57304 1.064789 0.61969 ... 0 1 0 0 0 0 0 1 1 0
1099 0.551285 0.340015 -0.74866 0.661837 -0.48597 0.216686 -0.33434 0.640253 -0.69435 0.61969 ... 0 1 0 0 0 0 0 0 0 0
416 0.173278 -0.4545 1.401497 -0.06242 1.344947 0.216686 -0.33434 0.622583 -0.69435 -0.66891 ... 0 1 0 0 0 0 0 1 1 0
1168 2.346818 0.703806 1.401497 -0.06242 1.344947 -1.21904 0.057073 -0.57304 -0.69435 -0.66891 ... 0 1 0 1 0 0 0 1 1 0
670 -0.29923 -0.29918 1.401497 -0.06242 -0.48597 1.11819 0.986677 -0.57304 1.064789 0.61969 ... 0 1 0 0 0 0 0 1 1 0

5 rows × 81 columns

image.png

构建回归模型

这里使用线性回归来拟合

image.png

衡量模型的效果(Evaluation)

image.png


结论:

  • 训练数据集的R2 要大于测试数据集,说明模型有overfitting的可能,下面我们通过对数据集的改造来改善这一情况

特征选择

  • 这里主要是找出互相有关联性的feature,然后选择其中一个最具有效信息的,其他的都删除的方式,类似于聚类

image.png

这里主要通过关联矩阵来计算和衡量

image.png

使用RFE进行特征选择

image.png

image.png

image.png

image.png

定义一个函数来构建普通的最小二乘模型

image.png

image.png

下面定义函数求方差膨胀因子

这个因子<5的情况是比较理想的,否则,说明当前模型存在比较严重的多重共线性。也就是说,当前模型的feature中有很多feature互相之间有比较密切的相关性,需要去掉一些feature

image.png

基于前面的构建最小二乘回归模型以及方差膨胀因子,我们按照下面的逻辑来构建一个自动的进行feature裁剪的方法:

  1. 当我们发现某些feature添加进来之后,回归模型的p值和VIF的值都升高了,那说明这个feature对我们的模型无益,需要drop掉
  2. 当我们发现feature加进来之后,回归模型的p值升高了,VIF降低了,那么将这些feature一个个drop,逐个检查
  3. 当我们发现feature加进来之后,回归模型的p值降低了,VIF升高了,那么将VIF高于5的feature去掉
  4. 当我们发现feature加进来之后,回归模型的p值降低了,VIF降低了,那说明这个feature是有益的feature,我们要保留
def perform_feature_selection(train_data,rfe=False,corr=False):
    X_train_rfe = pd.DataFrame()
    vif = pd.DataFrame()
    count = 1
    stop = False
    # 回归模型前一次的r2值
    r2score = 0.0
    if rfe:
        cols = rfe_selected_columns
    elif corr:
        cols = corr_selected_columns.Corr_feature
    else: 
        cols = corr_rfe_features.RFE.values
    for col in cols:
        if col in train_data.columns.values:
            X_train_rfe[col] = train_data[col]
            while True:
                lm = build_model(X_train_rfe)
                # 如果r2值完全不增加,那么将现在这个feature去掉
                if round(r2score,3) == round(lm.rsquared,3):
                    print("\n\n Dropping "+X_train_rfe.columns.values[-1]+" and rebuilding the model as it did not add any info to model \n\n")
                    X_train_rfe.drop(X_train_rfe.columns.values[-1],axis=1, inplace=True)
                    # 重新用剩余的feature来构建模型
                    lm = build_model(X_train_rfe)
                r2score = lm.rsquared
                if count != 1:
                    vif = VIF(X_train_rfe)
                    # 如果r2值超过90%,那么停止检查,认为模型达到预想效果,这个值可以设置
                    if lm.rsquared >= 0.90:
                        stop = True
                        break
                    # 检查是不是p值升高了
                    if (lm.pvalues > 0.05).sum() > 0:
                        feature = lm.pvalues[lm.pvalues > 0.05].index
                        if feature[0] != 'const':
                            # 如果VIF也升高了,那么drop
                            if feature[0] in vif.loc[vif.VIF > 5,'Features']:
                                X_train_rfe.drop(feature[0],axis=1,inplace=True)                 
                            else:
                                # 如果只有p值升高,drop
                                X_train_rfe.drop(feature[0],axis=1,inplace=True)                
                        elif (feature[0] == 'const') & (len(feature) > 1):
                            X_train_rfe.drop(feature[1],axis=1,inplace=True)                    
                    if ((vif.VIF > 5).sum() > 0) & (col in X_train_rfe.columns.values):
                        X_train_rfe.drop(col,axis=1,inplace=True)   # order 3
                    else:
                        break                                                                    
                else:
                    break
            if stop:
                break
            count = count + 1
    return X_train_rfe
X_train_rfe = perform_feature_selection(X_train,corr=True)

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.717
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     284.3
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          8.49e-270
Time:                        19:38:14   Log-Likelihood:                -12352.
No. Observations:                1021   AIC:                         2.472e+04
Df Residuals:                    1011   BIC:                         2.477e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const               1.85e+05   4368.649     42.355      0.000    1.76e+05    1.94e+05
1stFlrSF           2.644e+04   1669.004     15.841      0.000    2.32e+04    2.97e+04
BldgType_Twnhs    -1.101e+04   5464.297     -2.014      0.044   -2.17e+04    -282.459
BsmtExposure_NA    2.679e+04   1.11e+04      2.416      0.016    5033.785    4.85e+04
BsmtFinSF1         1.059e+04   1581.639      6.694      0.000    7484.070    1.37e+04
BsmtQual           2.001e+04   2362.866      8.468      0.000    1.54e+04    2.46e+04
ExterQual          2.335e+04   1962.316     11.899      0.000    1.95e+04    2.72e+04
Fireplaces         1.436e+04   1532.145      9.372      0.000    1.14e+04    1.74e+04
Foundation_CBlock  -1.08e+04   4703.225     -2.296      0.022      -2e+04   -1571.394
Foundation_PConc   4150.7686   5422.286      0.766      0.444   -6489.454    1.48e+04
==============================================================================
Omnibus:                      459.342   Durbin-Watson:                   1.924
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4837.327
Skew:                           1.781   Prob(JB):                         0.00
Kurtosis:                      13.051   Cond. No.                         12.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping Foundation_PConc and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.717
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     319.9
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          6.16e-271
Time:                        19:38:14   Log-Likelihood:                -12352.
No. Observations:                1021   AIC:                         2.472e+04
Df Residuals:                    1012   BIC:                         2.477e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const               1.88e+05   2053.300     91.553      0.000    1.84e+05    1.92e+05
1stFlrSF           2.643e+04   1668.635     15.840      0.000    2.32e+04    2.97e+04
BldgType_Twnhs    -1.102e+04   5463.152     -2.017      0.044   -2.17e+04    -297.840
BsmtExposure_NA    2.674e+04   1.11e+04      2.412      0.016    4987.135    4.85e+04
BsmtFinSF1         1.063e+04   1580.214      6.729      0.000    7532.044    1.37e+04
BsmtQual           2.065e+04   2207.730      9.354      0.000    1.63e+04     2.5e+04
ExterQual          2.363e+04   1928.489     12.251      0.000    1.98e+04    2.74e+04
Fireplaces          1.43e+04   1529.991      9.347      0.000    1.13e+04    1.73e+04
Foundation_CBlock -1.333e+04   3343.343     -3.988      0.000   -1.99e+04   -6771.611
==============================================================================
Omnibus:                      454.593   Durbin-Watson:                   1.926
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4727.204
Skew:                           1.762   Prob(JB):                         0.00
Kurtosis:                      12.935   Cond. No.                         12.7
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

    OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.717
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     284.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          1.10e-269
Time:                        19:38:14   Log-Likelihood:                -12352.
No. Observations:                1021   AIC:                         2.472e+04
Df Residuals:                    1011   BIC:                         2.477e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.879e+05   2080.552     90.316      0.000    1.84e+05    1.92e+05
1stFlrSF           2.641e+04   1671.162     15.805      0.000    2.31e+04    2.97e+04
BldgType_Twnhs    -1.107e+04   5470.533     -2.024      0.043   -2.18e+04    -338.744
BsmtExposure_NA    2.437e+04   1.48e+04      1.644      0.100   -4717.249    5.35e+04
BsmtFinSF1         1.062e+04   1581.462      6.717      0.000    7519.903    1.37e+04
BsmtQual           2.072e+04   2225.665      9.308      0.000    1.63e+04    2.51e+04
ExterQual          2.363e+04   1929.399     12.246      0.000    1.98e+04    2.74e+04
Fireplaces         1.431e+04   1531.068      9.346      0.000    1.13e+04    1.73e+04
Foundation_CBlock -1.319e+04   3396.063     -3.884      0.000   -1.99e+04   -6526.682
Foundation_Stone   3473.8182   1.44e+04      0.241      0.810   -2.48e+04    3.18e+04
==============================================================================
Omnibus:                      454.564   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4730.579
Skew:                           1.761   Prob(JB):                         0.00
Kurtosis:                      12.939   Cond. No.                         21.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping Foundation_Stone and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.717
Model:                            OLS   Adj. R-squared:                  0.714
Method:                 Least Squares   F-statistic:                     319.9
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          6.16e-271
Time:                        19:38:14   Log-Likelihood:                -12352.
No. Observations:                1021   AIC:                         2.472e+04
Df Residuals:                    1012   BIC:                         2.477e+04
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const               1.88e+05   2053.300     91.553      0.000    1.84e+05    1.92e+05
1stFlrSF           2.643e+04   1668.635     15.840      0.000    2.32e+04    2.97e+04
BldgType_Twnhs    -1.102e+04   5463.152     -2.017      0.044   -2.17e+04    -297.840
BsmtExposure_NA    2.674e+04   1.11e+04      2.412      0.016    4987.135    4.85e+04
BsmtFinSF1         1.063e+04   1580.214      6.729      0.000    7532.044    1.37e+04
BsmtQual           2.065e+04   2207.730      9.354      0.000    1.63e+04     2.5e+04
ExterQual          2.363e+04   1928.489     12.251      0.000    1.98e+04    2.74e+04
Fireplaces          1.43e+04   1529.991      9.347      0.000    1.13e+04    1.73e+04
Foundation_CBlock -1.333e+04   3343.343     -3.988      0.000   -1.99e+04   -6771.611
==============================================================================
Omnibus:                      454.593   Durbin-Watson:                   1.926
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4727.204
Skew:                           1.762   Prob(JB):                         0.00
Kurtosis:                      12.935   Cond. No.                         12.7
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png


                         OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.741
Model:                            OLS   Adj. R-squared:                  0.739
Method:                 Least Squares   F-statistic:                     321.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          3.27e-289
Time:                        19:38:14   Log-Likelihood:                -12307.
No. Observations:                1021   AIC:                         2.463e+04
Df Residuals:                    1011   BIC:                         2.468e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.871e+05   1966.956     95.103      0.000    1.83e+05    1.91e+05
1stFlrSF            2.27e+04   1642.064     13.827      0.000    1.95e+04    2.59e+04
BldgType_Twnhs    -2.369e+04   5387.593     -4.397      0.000   -3.43e+04   -1.31e+04
BsmtExposure_NA    1.419e+04   1.07e+04      1.328      0.185   -6782.532    3.52e+04
BsmtFinSF1         1.297e+04   1531.055      8.473      0.000    9967.629     1.6e+04
BsmtQual           1.552e+04   2177.549      7.126      0.000    1.12e+04    1.98e+04
ExterQual          2.072e+04   1869.252     11.087      0.000    1.71e+04    2.44e+04
Fireplaces         1.254e+04   1475.068      8.505      0.000    9650.354    1.54e+04
Foundation_CBlock -8227.4604   3241.896     -2.538      0.011   -1.46e+04   -1865.845
FullBath           1.641e+04   1689.171      9.714      0.000    1.31e+04    1.97e+04
==============================================================================
Omnibus:                      430.694   Durbin-Watson:                   1.946
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4257.081
Skew:                           1.661   Prob(JB):                         0.00
Kurtosis:                      12.436   Cond. No.                         13.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

image.png

       OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     320.4
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          4.98e-305
Time:                        19:38:14   Log-Likelihood:                -12267.
No. Observations:                1021   AIC:                         2.456e+04
Df Residuals:                    1010   BIC:                         2.461e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.872e+05   1821.622    102.744      0.000    1.84e+05    1.91e+05
1stFlrSF           1.902e+04   1629.925     11.670      0.000    1.58e+04    2.22e+04
BldgType_Twnhs    -2.128e+04   5162.883     -4.122      0.000   -3.14e+04   -1.11e+04
BsmtFinSF1         1.147e+04   1486.941      7.711      0.000    8548.411    1.44e+04
BsmtQual           1.227e+04   1758.908      6.976      0.000    8818.862    1.57e+04
ExterQual          1.757e+04   1824.971      9.629      0.000     1.4e+04    2.12e+04
Fireplaces         1.205e+04   1433.674      8.408      0.000    9240.387    1.49e+04
Foundation_CBlock -8052.8506   3040.517     -2.649      0.008    -1.4e+04   -2086.397
FullBath           1.457e+04   1641.700      8.874      0.000    1.13e+04    1.78e+04
GarageArea         1.513e+04   2809.559      5.384      0.000    9612.150    2.06e+04
GarageCars         -202.8822   2857.603     -0.071      0.943   -5810.402    5404.637
==============================================================================
Omnibus:                      491.859   Durbin-Watson:                   1.934
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6508.560
Skew:                           1.865   Prob(JB):                         0.00
Kurtosis:                      14.793   Cond. No.                         8.13
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping GarageCars and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     356.3
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          2.57e-306
Time:                        19:38:14   Log-Likelihood:                -12267.
No. Observations:                1021   AIC:                         2.455e+04
Df Residuals:                    1011   BIC:                         2.460e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.872e+05   1819.709    102.850      0.000    1.84e+05    1.91e+05
1stFlrSF           1.902e+04   1628.786     11.679      0.000    1.58e+04    2.22e+04
BldgType_Twnhs    -2.129e+04   5156.368     -4.130      0.000   -3.14e+04   -1.12e+04
BsmtFinSF1         1.147e+04   1481.601      7.745      0.000    8567.200    1.44e+04
BsmtQual           1.226e+04   1747.577      7.014      0.000    8827.495    1.57e+04
ExterQual          1.757e+04   1821.466      9.644      0.000     1.4e+04    2.11e+04
Fireplaces         1.204e+04   1419.676      8.481      0.000    9254.028    1.48e+04
Foundation_CBlock -8040.1455   3033.752     -2.650      0.008    -1.4e+04   -2086.974
FullBath           1.455e+04   1629.164      8.934      0.000    1.14e+04    1.78e+04
GarageArea         1.496e+04   1632.428      9.166      0.000    1.18e+04    1.82e+04
==============================================================================
Omnibus:                      491.958   Durbin-Watson:                   1.934
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6510.144
Skew:                           1.866   Prob(JB):                         0.00
Kurtosis:                      14.794   Cond. No.                         7.40
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

  OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     320.4
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          4.91e-305
Time:                        19:38:14   Log-Likelihood:                -12267.
No. Observations:                1021   AIC:                         2.456e+04
Df Residuals:                    1010   BIC:                         2.461e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.871e+05   1821.423    102.747      0.000    1.84e+05    1.91e+05
1stFlrSF           1.904e+04   1631.010     11.671      0.000    1.58e+04    2.22e+04
BldgType_Twnhs    -2.113e+04   5236.321     -4.034      0.000   -3.14e+04   -1.09e+04
BsmtFinSF1         1.146e+04   1483.410      7.728      0.000    8552.888    1.44e+04
BsmtQual           1.218e+04   1790.831      6.803      0.000    8669.676    1.57e+04
ExterQual          1.752e+04   1836.192      9.543      0.000    1.39e+04    2.11e+04
Fireplaces           1.2e+04   1435.173      8.362      0.000    9184.890    1.48e+04
Foundation_CBlock -8044.7975   3035.301     -2.650      0.008    -1.4e+04   -2088.579
FullBath           1.451e+04   1645.821      8.817      0.000    1.13e+04    1.77e+04
GarageArea         1.489e+04   1675.374      8.889      0.000    1.16e+04    1.82e+04
GarageFinish        312.0124   1657.567      0.188      0.851   -2940.656    3564.681
==============================================================================
Omnibus:                      492.929   Durbin-Watson:                   1.934
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6534.498
Skew:                           1.870   Prob(JB):                         0.00
Kurtosis:                      14.816   Cond. No.                         8.03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping GarageFinish and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     356.3
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          2.57e-306
Time:                        19:38:14   Log-Likelihood:                -12267.
No. Observations:                1021   AIC:                         2.455e+04
Df Residuals:                    1011   BIC:                         2.460e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.872e+05   1819.709    102.850      0.000    1.84e+05    1.91e+05
1stFlrSF           1.902e+04   1628.786     11.679      0.000    1.58e+04    2.22e+04
BldgType_Twnhs    -2.129e+04   5156.368     -4.130      0.000   -3.14e+04   -1.12e+04
BsmtFinSF1         1.147e+04   1481.601      7.745      0.000    8567.200    1.44e+04
BsmtQual           1.226e+04   1747.577      7.014      0.000    8827.495    1.57e+04
ExterQual          1.757e+04   1821.466      9.644      0.000     1.4e+04    2.11e+04
Fireplaces         1.204e+04   1419.676      8.481      0.000    9254.028    1.48e+04
Foundation_CBlock -8040.1455   3033.752     -2.650      0.008    -1.4e+04   -2086.974
FullBath           1.455e+04   1629.164      8.934      0.000    1.14e+04    1.78e+04
GarageArea         1.496e+04   1632.428      9.166      0.000    1.18e+04    1.82e+04
==============================================================================
Omnibus:                      491.958   Durbin-Watson:                   1.934
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6510.144
Skew:                           1.866   Prob(JB):                         0.00
Kurtosis:                      14.794   Cond. No.                         7.40
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     320.5
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          4.07e-305
Time:                        19:38:14   Log-Likelihood:                -12266.
No. Observations:                1021   AIC:                         2.455e+04
Df Residuals:                    1010   BIC:                         2.461e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.878e+05   2106.657     89.164      0.000    1.84e+05    1.92e+05
1stFlrSF           1.888e+04   1643.702     11.489      0.000    1.57e+04    2.21e+04
BldgType_Twnhs    -2.092e+04   5191.666     -4.029      0.000   -3.11e+04   -1.07e+04
BsmtFinSF1         1.142e+04   1484.888      7.688      0.000    8501.694    1.43e+04
BsmtQual           1.199e+04   1796.235      6.676      0.000    8466.998    1.55e+04
ExterQual          1.742e+04   1835.265      9.494      0.000    1.38e+04     2.1e+04
Fireplaces         1.198e+04   1423.240      8.417      0.000    9186.305    1.48e+04
Foundation_CBlock -8304.2324   3062.431     -2.712      0.007   -1.43e+04   -2294.777
FullBath           1.435e+04   1658.973      8.653      0.000    1.11e+04    1.76e+04
GarageArea         1.519e+04   1671.275      9.090      0.000    1.19e+04    1.85e+04
GarageType_Detchd -2092.7692   3262.275     -0.642      0.521   -8494.382    4308.843
==============================================================================
Omnibus:                      497.557   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6683.851
Skew:                           1.889   Prob(JB):                         0.00
Kurtosis:                      14.952   Cond. No.                         7.52
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping GarageType_Detchd and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.758
Method:                 Least Squares   F-statistic:                     356.3
Date:                Mon, 27 Jun 2022   Prob (F-statistic):          2.57e-306
Time:                        19:38:14   Log-Likelihood:                -12267.
No. Observations:                1021   AIC:                         2.455e+04
Df Residuals:                    1011   BIC:                         2.460e+04
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.872e+05   1819.709    102.850      0.000    1.84e+05    1.91e+05
1stFlrSF           1.902e+04   1628.786     11.679      0.000    1.58e+04    2.22e+04
BldgType_Twnhs    -2.129e+04   5156.368     -4.130      0.000   -3.14e+04   -1.12e+04
BsmtFinSF1         1.147e+04   1481.601      7.745      0.000    8567.200    1.44e+04
BsmtQual           1.226e+04   1747.577      7.014      0.000    8827.495    1.57e+04
ExterQual          1.757e+04   1821.466      9.644      0.000     1.4e+04    2.11e+04
Fireplaces         1.204e+04   1419.676      8.481      0.000    9254.028    1.48e+04
Foundation_CBlock -8040.1455   3033.752     -2.650      0.008    -1.4e+04   -2086.974
FullBath           1.455e+04   1629.164      8.934      0.000    1.14e+04    1.78e+04
GarageArea         1.496e+04   1632.428      9.166      0.000    1.18e+04    1.82e+04
==============================================================================
Omnibus:                      491.958   Durbin-Watson:                   1.934
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6510.144
Skew:                           1.866   Prob(JB):                         0.00
Kurtosis:                      14.794   Cond. No.                         7.40
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

image.png

image.png

        OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.815
Model:                            OLS   Adj. R-squared:                  0.813
Method:                 Least Squares   F-statistic:                     370.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12134.
No. Observations:                1021   AIC:                         2.429e+04
Df Residuals:                    1008   BIC:                         2.436e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.848e+05   1640.059    112.701      0.000    1.82e+05    1.88e+05
1stFlrSF           1.226e+04   1492.986      8.211      0.000    9328.575    1.52e+04
BldgType_Twnhs    -2.514e+04   4586.437     -5.481      0.000   -3.41e+04   -1.61e+04
BsmtFinSF1         1.304e+04   1309.221      9.958      0.000    1.05e+04    1.56e+04
BsmtQual           1.156e+04   1663.251      6.949      0.000    8293.985    1.48e+04
ExterQual          1.726e+04   1632.938     10.567      0.000    1.41e+04    2.05e+04
Fireplaces         5440.3950   1325.188      4.105      0.000    2839.951    8040.839
Foundation_CBlock -3801.6485   2688.584     -1.414      0.158   -9077.512    1474.214
FullBath           1259.5279   1673.974      0.752      0.452   -2025.344    4544.400
GarageArea         1.201e+04   1830.204      6.562      0.000    8418.298    1.56e+04
GarageType_NA      1.723e+04   6361.690      2.708      0.007    4743.491    2.97e+04
GarageYrBlt        2893.5852   1771.498      1.633      0.103    -582.660    6369.831
GrLivArea          2.936e+04   1780.188     16.491      0.000    2.59e+04    3.29e+04
==============================================================================
Omnibus:                      530.944   Durbin-Watson:                   1.907
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            10691.051
Skew:                           1.920   Prob(JB):                         0.00
Kurtosis:                      18.380   Cond. No.                         11.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


image.png

      OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.816
Model:                            OLS   Adj. R-squared:                  0.814
Method:                 Least Squares   F-statistic:                     372.8
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12131.
No. Observations:                1021   AIC:                         2.429e+04
Df Residuals:                    1008   BIC:                         2.435e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.831e+05   1179.345    155.279      0.000    1.81e+05    1.85e+05
1stFlrSF        1.208e+04   1481.295      8.154      0.000    9171.233     1.5e+04
BldgType_Twnhs -2.404e+04   4594.339     -5.233      0.000   -3.31e+04    -1.5e+04
BsmtFinSF1      1.282e+04   1288.946      9.948      0.000    1.03e+04    1.54e+04
BsmtQual         1.17e+04   1652.529      7.083      0.000    8462.040    1.49e+04
ExterQual        1.66e+04   1635.857     10.148      0.000    1.34e+04    1.98e+04
Fireplaces      5276.4977   1317.607      4.005      0.000    2690.931    7862.064
FullBath        1333.7615   1659.828      0.804      0.422   -1923.353    4590.876
GarageArea      1.256e+04   1830.055      6.866      0.000    8973.594    1.62e+04
GarageType_NA   1.764e+04   6323.109      2.790      0.005    5231.647       3e+04
GarageYrBlt     1856.8163   1811.485      1.025      0.306   -1697.897    5411.530
GrLivArea       2.918e+04   1774.282     16.444      0.000    2.57e+04    3.27e+04
HeatingQC       3841.1963   1357.881      2.829      0.005    1176.599    6505.794
==============================================================================
Omnibus:                      543.016   Durbin-Watson:                   1.907
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11212.459
Skew:                           1.974   Prob(JB):                         0.00
Kurtosis:                      18.747   Cond. No.                         12.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.


image.png

                       OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     390.9
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12111.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1008   BIC:                         2.431e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.828e+05   1155.016    158.291      0.000    1.81e+05    1.85e+05
1stFlrSF        1.167e+04   1452.788      8.030      0.000    8815.484    1.45e+04
BldgType_Twnhs -2.091e+04   4441.758     -4.707      0.000   -2.96e+04   -1.22e+04
BsmtFinSF1      1.215e+04   1257.408      9.662      0.000    9681.473    1.46e+04
BsmtQual        1.135e+04   1618.144      7.016      0.000    8177.842    1.45e+04
ExterQual       1.164e+04   1785.197      6.523      0.000    8141.863    1.51e+04
Fireplaces      5298.9064   1291.188      4.104      0.000    2765.183    7832.630
GarageArea      1.181e+04   1795.779      6.575      0.000    8283.341    1.53e+04
GarageType_NA   1.909e+04   6201.302      3.078      0.002    6919.318    3.13e+04
GarageYrBlt     1348.1313   1720.113      0.784      0.433   -2027.282    4723.544
GrLivArea       2.889e+04   1508.571     19.150      0.000    2.59e+04    3.18e+04
HeatingQC       2224.0926   1355.931      1.640      0.101    -436.679    4884.864
KitchenQual     1.088e+04   1706.221      6.374      0.000    7526.962    1.42e+04
==============================================================================
Omnibus:                      566.645   Durbin-Watson:                   1.909
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12320.064
Skew:                           2.079   Prob(JB):                         0.00
Kurtosis:                      19.502   Cond. No.                         12.4
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     391.0
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12111.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1008   BIC:                         2.431e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const            1.826e+05   1167.634    156.427      0.000     1.8e+05    1.85e+05
1stFlrSF         1.161e+04   1452.283      7.997      0.000    8764.102    1.45e+04
BldgType_Twnhs  -2.034e+04   4392.653     -4.631      0.000    -2.9e+04   -1.17e+04
BsmtFinSF1       1.196e+04   1259.989      9.494      0.000    9489.451    1.44e+04
BsmtQual         1.189e+04   1485.640      8.006      0.000    8978.934    1.48e+04
ExterQual        1.192e+04   1779.074      6.703      0.000    8433.137    1.54e+04
Fireplaces       5202.4712   1290.361      4.032      0.000    2670.369    7734.574
GarageArea       1.227e+04   1665.929      7.365      0.000    9000.717    1.55e+04
GarageType_NA    1.866e+04   6202.724      3.009      0.003    6490.813    3.08e+04
GrLivArea        2.869e+04   1477.181     19.421      0.000    2.58e+04    3.16e+04
HeatingQC        2511.1145   1321.773      1.900      0.058     -82.628    5104.857
KitchenQual      1.091e+04   1702.586      6.409      0.000    7570.148    1.43e+04
LandContour_Low  5776.3072   6735.792      0.858      0.391   -7441.473     1.9e+04
==============================================================================
Omnibus:                      564.056   Durbin-Watson:                   1.911
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12299.304
Skew:                           2.065   Prob(JB):                         0.00
Kurtosis:                      19.494   Cond. No.                         12.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping LandContour_Low and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     426.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12112.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1009   BIC:                         2.431e+04
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.828e+05   1154.193    158.379      0.000    1.81e+05    1.85e+05
1stFlrSF        1.163e+04   1451.911      8.013      0.000    8784.516    1.45e+04
BldgType_Twnhs -2.039e+04   4391.723     -4.643      0.000    -2.9e+04   -1.18e+04
BsmtFinSF1      1.207e+04   1253.306      9.632      0.000    9612.336    1.45e+04
BsmtQual        1.186e+04   1484.803      7.985      0.000    8943.121    1.48e+04
ExterQual        1.18e+04   1773.301      6.656      0.000    8324.156    1.53e+04
Fireplaces      5246.1628   1289.186      4.069      0.000    2716.370    7775.956
GarageArea      1.234e+04   1663.923      7.414      0.000    9070.826    1.56e+04
GarageType_NA    1.89e+04   6195.591      3.051      0.002    6744.886    3.11e+04
GrLivArea       2.865e+04   1476.152     19.406      0.000    2.57e+04    3.15e+04
HeatingQC       2464.6328   1320.488      1.866      0.062    -126.585    5055.850
KitchenQual     1.098e+04   1700.260      6.460      0.000    7647.272    1.43e+04
==============================================================================
Omnibus:                      562.418   Durbin-Watson:                   1.910
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12156.778
Skew:                           2.059   Prob(JB):                         0.00
Kurtosis:                      19.395   Cond. No.                         11.8
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

  OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     425.0
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12113.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1009   BIC:                         2.431e+04
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const            1.851e+05   3536.273     52.341      0.000    1.78e+05    1.92e+05
1stFlrSF         1.156e+04   1453.851      7.954      0.000    8710.631    1.44e+04
BldgType_Twnhs  -2.104e+04   4391.370     -4.790      0.000   -2.97e+04   -1.24e+04
BsmtFinSF1       1.191e+04   1253.222      9.507      0.000    9455.558    1.44e+04
BsmtQual         1.216e+04   1478.582      8.221      0.000    9254.430    1.51e+04
ExterQual        1.255e+04   1758.022      7.140      0.000    9101.715     1.6e+04
Fireplaces       5174.6946   1293.904      3.999      0.000    2635.643    7713.746
GarageArea       1.229e+04   1666.245      7.374      0.000    9016.493    1.56e+04
GarageType_NA    1.878e+04   6220.943      3.020      0.003    6576.901     3.1e+04
GrLivArea        2.876e+04   1476.951     19.471      0.000    2.59e+04    3.17e+04
KitchenQual      1.163e+04   1663.418      6.992      0.000    8365.761    1.49e+04
LandContour_Lvl -2487.6851   3676.620     -0.677      0.499   -9702.383    4727.013
==============================================================================
Omnibus:                      552.749   Durbin-Watson:                   1.907
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11824.863
Skew:                           2.012   Prob(JB):                         0.00
Kurtosis:                      19.179   Cond. No.                         11.5
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping LandContour_Lvl and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.822
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     467.8
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:14   Log-Likelihood:                -12114.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1010   BIC:                         2.430e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.828e+05   1155.495    158.227      0.000    1.81e+05    1.85e+05
1stFlrSF        1.158e+04   1453.358      7.964      0.000    8723.290    1.44e+04
BldgType_Twnhs -2.093e+04   4387.555     -4.771      0.000   -2.95e+04   -1.23e+04
BsmtFinSF1      1.193e+04   1252.616      9.526      0.000    9474.314    1.44e+04
BsmtQual        1.215e+04   1478.174      8.221      0.000    9251.256    1.51e+04
ExterQual       1.241e+04   1745.318      7.111      0.000    8986.552    1.58e+04
Fireplaces      5232.3481   1290.749      4.054      0.000    2699.491    7765.205
GarageArea      1.229e+04   1665.787      7.378      0.000    9021.448    1.56e+04
GarageType_NA   1.909e+04   6202.348      3.079      0.002    6923.724    3.13e+04
GrLivArea       2.877e+04   1476.374     19.490      0.000    2.59e+04    3.17e+04
KitchenQual     1.167e+04   1661.791      7.024      0.000    8411.351    1.49e+04
==============================================================================
Omnibus:                      552.339   Durbin-Watson:                   1.908
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11742.839
Skew:                           2.012   Prob(JB):                         0.00
Kurtosis:                      19.119   Cond. No.                         11.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

image.png

                 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     425.5
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12113.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1009   BIC:                         2.431e+04
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1.831e+05   1176.600    155.609      0.000    1.81e+05    1.85e+05
1stFlrSF           1.173e+04   1458.983      8.038      0.000    8864.126    1.46e+04
BldgType_Twnhs    -1.755e+04   5264.740     -3.334      0.001   -2.79e+04   -7221.682
BsmtFinSF1         1.194e+04   1252.427      9.535      0.000    9484.156    1.44e+04
BsmtQual            1.22e+04   1478.435      8.250      0.000    9296.091    1.51e+04
ExterQual          1.218e+04   1755.883      6.940      0.000    8739.368    1.56e+04
Fireplaces         5258.0746   1290.717      4.074      0.000    2725.278    7790.871
GarageArea         1.243e+04   1669.787      7.444      0.000    9152.437    1.57e+04
GarageType_NA      1.954e+04   6213.037      3.145      0.002    7346.307    3.17e+04
GrLivArea          2.866e+04   1479.146     19.379      0.000    2.58e+04    3.16e+04
KitchenQual        1.157e+04   1663.967      6.952      0.000    8302.023    1.48e+04
MSSubClass_nstory -5284.1947   4551.241     -1.161      0.246   -1.42e+04    3646.787
==============================================================================
Omnibus:                      552.108   Durbin-Watson:                   1.908
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11753.833
Skew:                           2.011   Prob(JB):                         0.00
Kurtosis:                      19.128   Cond. No.                         11.4
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping MSSubClass_nstory and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.822
Model:                            OLS   Adj. R-squared:                  0.821
Method:                 Least Squares   F-statistic:                     467.8
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12114.
No. Observations:                1021   AIC:                         2.425e+04
Df Residuals:                    1010   BIC:                         2.430e+04
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.828e+05   1155.495    158.227      0.000    1.81e+05    1.85e+05
1stFlrSF        1.158e+04   1453.358      7.964      0.000    8723.290    1.44e+04
BldgType_Twnhs -2.093e+04   4387.555     -4.771      0.000   -2.95e+04   -1.23e+04
BsmtFinSF1      1.193e+04   1252.616      9.526      0.000    9474.314    1.44e+04
BsmtQual        1.215e+04   1478.174      8.221      0.000    9251.256    1.51e+04
ExterQual       1.241e+04   1745.318      7.111      0.000    8986.552    1.58e+04
Fireplaces      5232.3481   1290.749      4.054      0.000    2699.491    7765.205
GarageArea      1.229e+04   1665.787      7.378      0.000    9021.448    1.56e+04
GarageType_NA   1.909e+04   6202.348      3.079      0.002    6923.724    3.13e+04
GrLivArea       2.877e+04   1476.374     19.490      0.000    2.59e+04    3.17e+04
KitchenQual     1.167e+04   1661.791      7.024      0.000    8411.351    1.49e+04
==============================================================================
Omnibus:                      552.339   Durbin-Watson:                   1.908
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11742.839
Skew:                           2.012   Prob(JB):                         0.00
Kurtosis:                      19.119   Cond. No.                         11.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.831
Model:                            OLS   Adj. R-squared:                  0.829
Method:                 Least Squares   F-statistic:                     413.7
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12088.
No. Observations:                1021   AIC:                         2.420e+04
Df Residuals:                    1008   BIC:                         2.427e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.842e+05   1214.413    151.664      0.000    1.82e+05    1.87e+05
1stFlrSF        1.049e+04   1436.688      7.299      0.000    7666.570    1.33e+04
BldgType_Twnhs -2.059e+04   4306.482     -4.780      0.000    -2.9e+04   -1.21e+04
BsmtFinSF1      1.074e+04   1233.568      8.708      0.000    8321.547    1.32e+04
BsmtQual        1.113e+04   1449.648      7.677      0.000    8283.843     1.4e+04
ExterQual       1.079e+04   1717.956      6.279      0.000    7416.062    1.42e+04
Fireplaces      4590.8153   1263.466      3.634      0.000    2111.491    7070.140
GarageArea      1.055e+04   1643.302      6.421      0.000    7327.118    1.38e+04
GarageType_NA   1.648e+04   6063.698      2.717      0.007    4577.616    2.84e+04
GrLivArea       2.791e+04   1450.986     19.235      0.000    2.51e+04    3.08e+04
KitchenQual     1.233e+04   1624.851      7.588      0.000    9140.322    1.55e+04
MSZoning_RM    -8266.9289   3145.256     -2.628      0.009   -1.44e+04   -2094.929
MasVnrArea      8321.3261   1222.519      6.807      0.000    5922.353    1.07e+04
==============================================================================
Omnibus:                      548.666   Durbin-Watson:                   1.913
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12102.892
Skew:                           1.981   Prob(JB):                         0.00
Kurtosis:                      19.395   Cond. No.                         11.8
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

             OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.831
Model:                            OLS   Adj. R-squared:                  0.829
Method:                 Least Squares   F-statistic:                     381.5
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12088.
No. Observations:                1021   AIC:                         2.420e+04
Df Residuals:                    1007   BIC:                         2.427e+04
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.842e+05   1215.819    151.500      0.000    1.82e+05    1.87e+05
1stFlrSF              1.045e+04   1441.819      7.250      0.000    7623.761    1.33e+04
BldgType_Twnhs       -2.056e+04   4309.498     -4.770      0.000    -2.9e+04   -1.21e+04
BsmtFinSF1            1.074e+04   1234.155      8.702      0.000    8318.119    1.32e+04
BsmtQual              1.115e+04   1451.919      7.678      0.000    8299.103     1.4e+04
ExterQual             1.077e+04   1720.226      6.259      0.000    7390.981    1.41e+04
Fireplaces            4603.0699   1264.755      3.639      0.000    2121.213    7084.927
GarageArea            1.054e+04   1644.471      6.410      0.000    7314.086    1.38e+04
GarageType_NA         1.652e+04   6068.637      2.723      0.007    4614.807    2.84e+04
GrLivArea             2.789e+04   1452.752     19.200      0.000     2.5e+04    3.07e+04
KitchenQual           1.232e+04   1626.018      7.576      0.000    9127.270    1.55e+04
MSZoning_RM          -7926.2417   3361.189     -2.358      0.019   -1.45e+04   -1330.506
MasVnrArea            8348.5114   1226.703      6.806      0.000    5941.324    1.08e+04
Neighborhood_MeadowV -1720.3806   5966.275     -0.288      0.773   -1.34e+04    9987.376
==============================================================================
Omnibus:                      549.424   Durbin-Watson:                   1.914
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12140.657
Skew:                           1.984   Prob(JB):                         0.00
Kurtosis:                      19.421   Cond. No.                         11.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping Neighborhood_MeadowV and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.831
Model:                            OLS   Adj. R-squared:                  0.829
Method:                 Least Squares   F-statistic:                     413.7
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12088.
No. Observations:                1021   AIC:                         2.420e+04
Df Residuals:                    1008   BIC:                         2.427e+04
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const           1.842e+05   1214.413    151.664      0.000    1.82e+05    1.87e+05
1stFlrSF        1.049e+04   1436.688      7.299      0.000    7666.570    1.33e+04
BldgType_Twnhs -2.059e+04   4306.482     -4.780      0.000    -2.9e+04   -1.21e+04
BsmtFinSF1      1.074e+04   1233.568      8.708      0.000    8321.547    1.32e+04
BsmtQual        1.113e+04   1449.648      7.677      0.000    8283.843     1.4e+04
ExterQual       1.079e+04   1717.956      6.279      0.000    7416.062    1.42e+04
Fireplaces      4590.8153   1263.466      3.634      0.000    2111.491    7070.140
GarageArea      1.055e+04   1643.302      6.421      0.000    7327.118    1.38e+04
GarageType_NA   1.648e+04   6063.698      2.717      0.007    4577.616    2.84e+04
GrLivArea       2.791e+04   1450.986     19.235      0.000    2.51e+04    3.08e+04
KitchenQual     1.233e+04   1624.851      7.588      0.000    9140.322    1.55e+04
MSZoning_RM    -8266.9289   3145.256     -2.628      0.009   -1.44e+04   -2094.929
MasVnrArea      8321.3261   1222.519      6.807      0.000    5922.353    1.07e+04
==============================================================================
Omnibus:                      548.666   Durbin-Watson:                   1.913
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12102.892
Skew:                           1.981   Prob(JB):                         0.00
Kurtosis:                      19.395   Cond. No.                         11.8
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

         OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.845
Model:                            OLS   Adj. R-squared:                  0.843
Method:                 Least Squares   F-statistic:                     422.7
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12044.
No. Observations:                1021   AIC:                         2.412e+04
Df Residuals:                    1007   BIC:                         2.418e+04
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.806e+05   1222.382    147.769      0.000    1.78e+05    1.83e+05
1stFlrSF              9881.8312   1378.378      7.169      0.000    7177.008    1.26e+04
BldgType_Twnhs       -2.155e+04   4128.566     -5.221      0.000   -2.97e+04   -1.35e+04
BsmtFinSF1            9574.9772   1188.600      8.056      0.000    7242.560    1.19e+04
BsmtQual              1.055e+04   1390.655      7.588      0.000    7823.465    1.33e+04
ExterQual             7906.6368   1674.117      4.723      0.000    4621.479    1.12e+04
Fireplaces            4744.9680   1211.007      3.918      0.000    2368.582    7121.354
GarageArea            9527.1043   1578.614      6.035      0.000    6429.354    1.26e+04
GarageType_NA         1.171e+04   5832.972      2.008      0.045     266.134    2.32e+04
GrLivArea             2.686e+04   1395.024     19.252      0.000    2.41e+04    2.96e+04
KitchenQual           1.168e+04   1558.728      7.495      0.000    8624.346    1.47e+04
MSZoning_RM          -8820.3963   3014.958     -2.926      0.004   -1.47e+04   -2904.077
MasVnrArea            5725.4837   1203.038      4.759      0.000    3364.735    8086.232
Neighborhood_NridgHt  4.014e+04   4221.716      9.509      0.000    3.19e+04    4.84e+04
==============================================================================
Omnibus:                      525.962   Durbin-Watson:                   1.891
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            10975.683
Skew:                           1.883   Prob(JB):                         0.00
Kurtosis:                      18.615   Cond. No.                         11.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

      OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.845
Model:                            OLS   Adj. R-squared:                  0.843
Method:                 Least Squares   F-statistic:                     392.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12044.
No. Observations:                1021   AIC:                         2.412e+04
Df Residuals:                    1006   BIC:                         2.419e+04
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.806e+05   1265.162    142.785      0.000    1.78e+05    1.83e+05
1stFlrSF              9881.2547   1379.115      7.165      0.000    7174.983    1.26e+04
BldgType_Twnhs       -2.156e+04   4136.110     -5.214      0.000   -2.97e+04   -1.34e+04
BsmtFinSF1            9573.2151   1189.767      8.046      0.000    7238.506    1.19e+04
BsmtQual              1.054e+04   1412.453      7.463      0.000    7769.120    1.33e+04
ExterQual             7906.8522   1674.953      4.721      0.000    4620.050    1.12e+04
Fireplaces            4740.5740   1215.126      3.901      0.000    2356.103    7125.045
GarageArea            9527.0936   1579.397      6.032      0.000    6427.804    1.26e+04
GarageType_NA         1.176e+04   5904.833      1.991      0.047     167.897    2.33e+04
GrLivArea             2.686e+04   1396.517     19.233      0.000    2.41e+04    2.96e+04
KitchenQual           1.168e+04   1559.841      7.489      0.000    8620.607    1.47e+04
MSZoning_RM          -8756.1191   3305.466     -2.649      0.008   -1.52e+04   -2269.721
MasVnrArea            5721.3783   1206.727      4.741      0.000    3353.388    8089.368
Neighborhood_NridgHt  4.015e+04   4227.505      9.498      0.000    3.19e+04    4.84e+04
Neighborhood_OldTown  -152.0459   3197.432     -0.048      0.962   -6426.446    6122.354
==============================================================================
Omnibus:                      526.126   Durbin-Watson:                   1.891
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            10982.360
Skew:                           1.884   Prob(JB):                         0.00
Kurtosis:                      18.619   Cond. No.                         12.2
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping Neighborhood_OldTown and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.845
Model:                            OLS   Adj. R-squared:                  0.843
Method:                 Least Squares   F-statistic:                     422.7
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12044.
No. Observations:                1021   AIC:                         2.412e+04
Df Residuals:                    1007   BIC:                         2.418e+04
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.806e+05   1222.382    147.769      0.000    1.78e+05    1.83e+05
1stFlrSF              9881.8312   1378.378      7.169      0.000    7177.008    1.26e+04
BldgType_Twnhs       -2.155e+04   4128.566     -5.221      0.000   -2.97e+04   -1.35e+04
BsmtFinSF1            9574.9772   1188.600      8.056      0.000    7242.560    1.19e+04
BsmtQual              1.055e+04   1390.655      7.588      0.000    7823.465    1.33e+04
ExterQual             7906.6368   1674.117      4.723      0.000    4621.479    1.12e+04
Fireplaces            4744.9680   1211.007      3.918      0.000    2368.582    7121.354
GarageArea            9527.1043   1578.614      6.035      0.000    6429.354    1.26e+04
GarageType_NA         1.171e+04   5832.972      2.008      0.045     266.134    2.32e+04
GrLivArea             2.686e+04   1395.024     19.252      0.000    2.41e+04    2.96e+04
KitchenQual           1.168e+04   1558.728      7.495      0.000    8624.346    1.47e+04
MSZoning_RM          -8820.3963   3014.958     -2.926      0.004   -1.47e+04   -2904.077
MasVnrArea            5725.4837   1203.038      4.759      0.000    3364.735    8086.232
Neighborhood_NridgHt  4.014e+04   4221.716      9.509      0.000    3.19e+04    4.84e+04
==============================================================================
Omnibus:                      525.962   Durbin-Watson:                   1.891
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            10975.683
Skew:                           1.883   Prob(JB):                         0.00
Kurtosis:                      18.615   Cond. No.                         11.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.846
Model:                            OLS   Adj. R-squared:                  0.844
Method:                 Least Squares   F-statistic:                     395.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12040.
No. Observations:                1021   AIC:                         2.411e+04
Df Residuals:                    1006   BIC:                         2.418e+04
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.829e+05   1513.711    120.850      0.000     1.8e+05    1.86e+05
1stFlrSF              1.029e+04   1383.847      7.437      0.000    7576.255     1.3e+04
BldgType_Twnhs       -2.107e+04   4121.552     -5.112      0.000   -2.92e+04    -1.3e+04
BsmtFinSF1            9951.9509   1194.407      8.332      0.000    7608.136    1.23e+04
BsmtQual              9916.1021   1408.849      7.038      0.000    7151.484    1.27e+04
ExterQual             7337.8281   1684.176      4.357      0.000    4032.927    1.06e+04
Fireplaces            4668.1640   1208.040      3.864      0.000    2297.597    7038.732
GarageArea            9294.9449   1576.864      5.895      0.000    6200.625    1.24e+04
GarageType_NA         1.015e+04   5848.615      1.736      0.083   -1324.329    2.16e+04
GrLivArea             2.643e+04   1401.140     18.862      0.000    2.37e+04    2.92e+04
KitchenQual           1.141e+04   1558.182      7.320      0.000    8348.381    1.45e+04
MSZoning_RM          -1.116e+04   3142.068     -3.552      0.000   -1.73e+04   -4994.366
MasVnrArea            5811.4161   1200.190      4.842      0.000    3456.253    8166.579
Neighborhood_NridgHt  3.969e+04   4213.738      9.420      0.000    3.14e+04     4.8e+04
Neighborhood_Sawyer  -6846.2544   2670.071     -2.564      0.010   -1.21e+04   -1606.708
==============================================================================
Omnibus:                      533.433   Durbin-Watson:                   1.888
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            11307.461
Skew:                           1.916   Prob(JB):                         0.00
Kurtosis:                      18.847   Cond. No.                         12.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

                   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.851
Model:                            OLS   Adj. R-squared:                  0.849
Method:                 Least Squares   F-statistic:                     410.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -12024.
No. Observations:                1021   AIC:                         2.408e+04
Df Residuals:                    1006   BIC:                         2.415e+04
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1461.041    125.845      0.000    1.81e+05    1.87e+05
1stFlrSF              1.131e+04   1357.945      8.326      0.000    8641.543     1.4e+04
BldgType_Twnhs       -1.643e+04   4066.064     -4.041      0.000   -2.44e+04   -8450.831
BsmtFinSF1            9593.0812   1176.603      8.153      0.000    7284.205    1.19e+04
BsmtQual              1.149e+04   1403.250      8.186      0.000    8732.765    1.42e+04
ExterQual             7568.3900   1657.339      4.567      0.000    4316.153    1.08e+04
Fireplaces            4257.3628   1179.779      3.609      0.000    1942.252    6572.473
GarageArea            7987.8508   1330.353      6.004      0.000    5377.266    1.06e+04
GrLivArea             2.604e+04   1377.421     18.904      0.000    2.33e+04    2.87e+04
KitchenQual           9963.1232   1547.538      6.438      0.000    6926.351     1.3e+04
MSZoning_RM          -1.356e+04   3107.452     -4.363      0.000   -1.97e+04   -7460.780
MasVnrArea            6115.6579   1180.424      5.181      0.000    3799.282    8432.034
Neighborhood_NridgHt  4.175e+04   4138.830     10.086      0.000    3.36e+04    4.99e+04
Neighborhood_Sawyer  -9172.8702   2630.289     -3.487      0.001   -1.43e+04   -4011.388
OverallCond           6370.9537   1053.863      6.045      0.000    4302.933    8438.975
==============================================================================
Omnibus:                      554.023   Durbin-Watson:                   1.916
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            12432.846
Skew:                           2.003   Prob(JB):                         0.00
Kurtosis:                      19.620   Cond. No.                         8.64
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

      OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     408.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11996.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1005   BIC:                         2.410e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1422.438    129.299      0.000    1.81e+05    1.87e+05
1stFlrSF              1.074e+04   1324.195      8.111      0.000    8141.768    1.33e+04
BldgType_Twnhs       -1.634e+04   3958.597     -4.127      0.000   -2.41e+04   -8568.302
BsmtFinSF1            1.061e+04   1153.541      9.201      0.000    8350.347    1.29e+04
BsmtQual              7590.3193   1461.385      5.194      0.000    4722.603    1.05e+04
ExterQual             4529.7459   1663.509      2.723      0.007    1265.398    7794.094
Fireplaces            2869.0131   1163.381      2.466      0.014     586.078    5151.948
GarageArea            6728.2548   1306.005      5.152      0.000    4165.445    9291.064
GrLivArea             2.386e+04   1372.160     17.386      0.000    2.12e+04    2.65e+04
KitchenQual           7764.5543   1534.821      5.059      0.000    4752.733    1.08e+04
MSZoning_RM          -1.327e+04   3025.556     -4.385      0.000   -1.92e+04   -7329.794
MasVnrArea            5247.4047   1155.023      4.543      0.000    2980.871    7513.939
Neighborhood_NridgHt  3.811e+04   4058.436      9.390      0.000    3.01e+04    4.61e+04
Neighborhood_Sawyer  -8254.2575   2563.679     -3.220      0.001   -1.33e+04   -3223.480
OverallCond           5545.1589   1031.882      5.374      0.000    3520.268    7570.050
OverallQual           1.372e+04   1827.178      7.508      0.000    1.01e+04    1.73e+04
==============================================================================
Omnibus:                      578.918   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13924.298
Skew:                           2.109   Prob(JB):                         0.00
Kurtosis:                      20.593   Cond. No.                         9.39
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

      OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.860
Model:                            OLS   Adj. R-squared:                  0.858
Method:                 Least Squares   F-statistic:                     385.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11992.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1004   BIC:                         2.410e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   1.9e+05   2610.116     72.786      0.000    1.85e+05    1.95e+05
1stFlrSF              1.037e+04   1326.717      7.814      0.000    7763.565     1.3e+04
BldgType_Twnhs       -1.686e+04   3950.067     -4.268      0.000   -2.46e+04   -9106.475
BsmtFinSF1            1.075e+04   1150.781      9.341      0.000    8490.918     1.3e+04
BsmtQual              7394.1167   1458.301      5.070      0.000    4532.450    1.03e+04
ExterQual             4461.8919   1658.214      2.691      0.007    1207.930    7715.853
Fireplaces            3182.4202   1165.074      2.732      0.006     896.160    5468.680
GarageArea            6741.8245   1301.715      5.179      0.000    4187.431    9296.218
GrLivArea             2.402e+04   1368.845     17.544      0.000    2.13e+04    2.67e+04
KitchenQual           7325.0898   1537.998      4.763      0.000    4307.031    1.03e+04
MSZoning_RM          -1.323e+04   3015.633     -4.386      0.000   -1.91e+04   -7307.567
MasVnrArea            5233.8638   1151.231      4.546      0.000    2974.769    7492.959
Neighborhood_NridgHt   3.79e+04   4045.780      9.368      0.000       3e+04    4.58e+04
Neighborhood_Sawyer  -8268.5869   2555.244     -3.236      0.001   -1.33e+04   -3254.355
OverallCond           5927.3213   1037.725      5.712      0.000    3890.963    7963.679
OverallQual           1.368e+04   1821.206      7.514      0.000    1.01e+04    1.73e+04
SaleCondition_Normal -7316.4511   2645.254     -2.766      0.006   -1.25e+04   -2125.590
==============================================================================
Omnibus:                      562.479   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13592.318
Skew:                           2.022   Prob(JB):                         0.00
Kurtosis:                      20.411   Cond. No.                         9.48
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.862
Model:                            OLS   Adj. R-squared:                  0.860
Method:                 Least Squares   F-statistic:                     368.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11985.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1003   BIC:                         2.409e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                  1.96e+05   3040.078     64.474      0.000     1.9e+05    2.02e+05
1stFlrSF              1.003e+04   1320.868      7.596      0.000    7441.776    1.26e+04
BldgType_Twnhs       -1.731e+04   3925.793     -4.410      0.000    -2.5e+04   -9610.000
BsmtFinSF1            1.112e+04   1147.441      9.695      0.000    8872.698    1.34e+04
BsmtQual              6980.6942   1452.750      4.805      0.000    4129.917    9831.471
ExterQual             3917.5315   1653.484      2.369      0.018     672.848    7162.216
Fireplaces            3116.8400   1157.502      2.693      0.007     845.437    5388.243
GarageArea            6393.2036   1296.368      4.932      0.000    3849.300    8937.108
GrLivArea             2.445e+04   1364.598     17.917      0.000    2.18e+04    2.71e+04
KitchenQual           7401.9047   1527.965      4.844      0.000    4403.530    1.04e+04
MSZoning_RM          -1.302e+04   2996.196     -4.345      0.000   -1.89e+04   -7138.534
MasVnrArea            5157.6435   1143.797      4.509      0.000    2913.133    7402.154
Neighborhood_NridgHt  3.776e+04   4019.204      9.395      0.000    2.99e+04    4.56e+04
Neighborhood_Sawyer  -7983.7098   2539.462     -3.144      0.002    -1.3e+04   -3000.441
OverallCond           5997.5051   1031.031      5.817      0.000    3974.280    8020.730
OverallQual           1.378e+04   1809.346      7.616      0.000    1.02e+04    1.73e+04
SaleCondition_Normal   585.7543   3352.560      0.175      0.861   -5993.081    7164.589
SaleType_WD          -1.455e+04   3834.265     -3.796      0.000   -2.21e+04   -7029.197
==============================================================================
Omnibus:                      585.151   Durbin-Watson:                   1.932
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            16057.205
Skew:                           2.097   Prob(JB):                         0.00
Kurtosis:                      21.970   Cond. No.                         10.9
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

  OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     408.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11996.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1005   BIC:                         2.410e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1422.438    129.299      0.000    1.81e+05    1.87e+05
1stFlrSF              1.074e+04   1324.195      8.111      0.000    8141.768    1.33e+04
BldgType_Twnhs       -1.634e+04   3958.597     -4.127      0.000   -2.41e+04   -8568.302
BsmtFinSF1            1.061e+04   1153.541      9.201      0.000    8350.347    1.29e+04
BsmtQual              7590.3193   1461.385      5.194      0.000    4722.603    1.05e+04
ExterQual             4529.7459   1663.509      2.723      0.007    1265.398    7794.094
Fireplaces            2869.0131   1163.381      2.466      0.014     586.078    5151.948
GarageArea            6728.2548   1306.005      5.152      0.000    4165.445    9291.064
GrLivArea             2.386e+04   1372.160     17.386      0.000    2.12e+04    2.65e+04
KitchenQual           7764.5543   1534.821      5.059      0.000    4752.733    1.08e+04
MSZoning_RM          -1.327e+04   3025.556     -4.385      0.000   -1.92e+04   -7329.794
MasVnrArea            5247.4047   1155.023      4.543      0.000    2980.871    7513.939
Neighborhood_NridgHt  3.811e+04   4058.436      9.390      0.000    3.01e+04    4.61e+04
Neighborhood_Sawyer  -8254.2575   2563.679     -3.220      0.001   -1.33e+04   -3223.480
OverallCond           5545.1589   1031.882      5.374      0.000    3520.268    7570.050
OverallQual           1.372e+04   1827.178      7.508      0.000    1.01e+04    1.73e+04
==============================================================================
Omnibus:                      578.918   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13924.298
Skew:                           2.109   Prob(JB):                         0.00
Kurtosis:                      20.593   Cond. No.                         9.39
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     383.2
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11995.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1004   BIC:                         2.411e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1421.752    129.337      0.000    1.81e+05    1.87e+05
1stFlrSF              1.079e+04   1323.764      8.149      0.000    8189.912    1.34e+04
BldgType_Twnhs       -1.715e+04   3993.759     -4.295      0.000    -2.5e+04   -9315.794
BsmtFinSF1            1.079e+04   1159.120      9.312      0.000    8519.460    1.31e+04
BsmtQual              7692.7568   1462.100      5.261      0.000    4823.634    1.06e+04
ExterQual             4482.7411   1662.788      2.696      0.007    1219.802    7745.680
Fireplaces            2931.0803   1163.412      2.519      0.012     648.083    5214.077
GarageArea            6744.6653   1305.252      5.167      0.000    4183.330    9306.000
GrLivArea              2.15e+04   2091.712     10.277      0.000    1.74e+04    2.56e+04
KitchenQual           7692.8974   1534.632      5.013      0.000    4681.444    1.07e+04
MSZoning_RM          -1.278e+04   3041.027     -4.203      0.000   -1.88e+04   -6815.270
MasVnrArea            5222.6380   1154.436      4.524      0.000    2957.255    7488.021
Neighborhood_NridgHt   3.85e+04   4064.228      9.472      0.000    3.05e+04    4.65e+04
Neighborhood_Sawyer  -8317.5382   2562.460     -3.246      0.001   -1.33e+04   -3289.146
OverallCond           5562.6231   1031.317      5.394      0.000    3538.839    7586.407
OverallQual           1.377e+04   1826.438      7.542      0.000    1.02e+04    1.74e+04
TotRmsAbvGrd          2691.9311   1802.156      1.494      0.136    -844.494    6228.356
==============================================================================
Omnibus:                      574.225   Durbin-Watson:                   1.926
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13648.741
Skew:                           2.089   Prob(JB):                         0.00
Kurtosis:                      20.418   Cond. No.                         9.82
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping TotRmsAbvGrd and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     408.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11996.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1005   BIC:                         2.410e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1422.438    129.299      0.000    1.81e+05    1.87e+05
1stFlrSF              1.074e+04   1324.195      8.111      0.000    8141.768    1.33e+04
BldgType_Twnhs       -1.634e+04   3958.597     -4.127      0.000   -2.41e+04   -8568.302
BsmtFinSF1            1.061e+04   1153.541      9.201      0.000    8350.347    1.29e+04
BsmtQual              7590.3193   1461.385      5.194      0.000    4722.603    1.05e+04
ExterQual             4529.7459   1663.509      2.723      0.007    1265.398    7794.094
Fireplaces            2869.0131   1163.381      2.466      0.014     586.078    5151.948
GarageArea            6728.2548   1306.005      5.152      0.000    4165.445    9291.064
GrLivArea             2.386e+04   1372.160     17.386      0.000    2.12e+04    2.65e+04
KitchenQual           7764.5543   1534.821      5.059      0.000    4752.733    1.08e+04
MSZoning_RM          -1.327e+04   3025.556     -4.385      0.000   -1.92e+04   -7329.794
MasVnrArea            5247.4047   1155.023      4.543      0.000    2980.871    7513.939
Neighborhood_NridgHt  3.811e+04   4058.436      9.390      0.000    3.01e+04    4.61e+04
Neighborhood_Sawyer  -8254.2575   2563.679     -3.220      0.001   -1.33e+04   -3223.480
OverallCond           5545.1589   1031.882      5.374      0.000    3520.268    7570.050
OverallQual           1.372e+04   1827.178      7.508      0.000    1.01e+04    1.73e+04
==============================================================================
Omnibus:                      578.918   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13924.298
Skew:                           2.109   Prob(JB):                         0.00
Kurtosis:                      20.593   Cond. No.                         9.39
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     383.7
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11994.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1004   BIC:                         2.411e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1432.315    128.178      0.000    1.81e+05    1.86e+05
1stFlrSF              1.172e+04   1429.863      8.199      0.000    8917.509    1.45e+04
BldgType_Twnhs       -1.777e+04   4032.706     -4.407      0.000   -2.57e+04   -9856.894
BsmtFinSF1            9609.7527   1278.842      7.514      0.000    7100.242    1.21e+04
BsmtQual              7111.6102   1483.497      4.794      0.000    4200.500       1e+04
ExterQual             4421.0278   1662.713      2.659      0.008    1158.237    7683.819
Fireplaces            2718.6584   1165.031      2.334      0.020     432.484    5004.833
GarageArea            6501.1295   1310.549      4.961      0.000    3929.400    9072.859
GrLivArea             2.224e+04   1635.183     13.602      0.000     1.9e+04    2.55e+04
KitchenQual           7542.2667   1537.997      4.904      0.000    4524.211    1.06e+04
MSZoning_RM          -1.163e+04   3154.868     -3.686      0.000   -1.78e+04   -5437.028
MasVnrArea            5167.7796   1154.556      4.476      0.000    2902.161    7433.399
Neighborhood_NridgHt  3.913e+04   4092.975      9.560      0.000    3.11e+04    4.72e+04
Neighborhood_Sawyer  -7945.3967   2566.459     -3.096      0.002    -1.3e+04   -2909.158
OverallCond           5660.8189   1032.694      5.482      0.000    3634.332    7687.306
OverallQual           1.358e+04   1826.780      7.432      0.000    9992.753    1.72e+04
TotalBath             2979.5942   1646.096      1.810      0.071    -250.588    6209.776
==============================================================================
Omnibus:                      575.417   Durbin-Watson:                   1.936
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13540.596
Skew:                           2.098   Prob(JB):                         0.00
Kurtosis:                      20.340   Cond. No.                         10.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping TotalBath and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.859
Model:                            OLS   Adj. R-squared:                  0.857
Method:                 Least Squares   F-statistic:                     408.1
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11996.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1005   BIC:                         2.410e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.839e+05   1422.438    129.299      0.000    1.81e+05    1.87e+05
1stFlrSF              1.074e+04   1324.195      8.111      0.000    8141.768    1.33e+04
BldgType_Twnhs       -1.634e+04   3958.597     -4.127      0.000   -2.41e+04   -8568.302
BsmtFinSF1            1.061e+04   1153.541      9.201      0.000    8350.347    1.29e+04
BsmtQual              7590.3193   1461.385      5.194      0.000    4722.603    1.05e+04
ExterQual             4529.7459   1663.509      2.723      0.007    1265.398    7794.094
Fireplaces            2869.0131   1163.381      2.466      0.014     586.078    5151.948
GarageArea            6728.2548   1306.005      5.152      0.000    4165.445    9291.064
GrLivArea             2.386e+04   1372.160     17.386      0.000    2.12e+04    2.65e+04
KitchenQual           7764.5543   1534.821      5.059      0.000    4752.733    1.08e+04
MSZoning_RM          -1.327e+04   3025.556     -4.385      0.000   -1.92e+04   -7329.794
MasVnrArea            5247.4047   1155.023      4.543      0.000    2980.871    7513.939
Neighborhood_NridgHt  3.811e+04   4058.436      9.390      0.000    3.01e+04    4.61e+04
Neighborhood_Sawyer  -8254.2575   2563.679     -3.220      0.001   -1.33e+04   -3223.480
OverallCond           5545.1589   1031.882      5.374      0.000    3520.268    7570.050
OverallQual           1.372e+04   1827.178      7.508      0.000    1.01e+04    1.73e+04
==============================================================================
Omnibus:                      578.918   Durbin-Watson:                   1.925
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13924.298
Skew:                           2.109   Prob(JB):                         0.00
Kurtosis:                      20.593   Cond. No.                         9.39
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.860
Model:                            OLS   Adj. R-squared:                  0.858
Method:                 Least Squares   F-statistic:                     385.2
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11992.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1004   BIC:                         2.410e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.841e+05   1420.133    129.634      0.000    1.81e+05    1.87e+05
1stFlrSF              7119.1079   1928.432      3.692      0.000    3334.890    1.09e+04
BldgType_Twnhs       -1.582e+04   3952.662     -4.002      0.000   -2.36e+04   -8061.543
BsmtFinSF1            1.009e+04   1167.899      8.643      0.000    7802.003    1.24e+04
BsmtQual              5886.5168   1600.310      3.678      0.000    2746.182    9026.852
ExterQual             4572.1359   1658.943      2.756      0.006    1316.743    7827.529
Fireplaces            3062.7983   1162.566      2.635      0.009     781.460    5344.137
GarageArea            6739.2785   1302.364      5.175      0.000    4183.612    9294.945
GrLivArea             2.447e+04   1388.612     17.619      0.000    2.17e+04    2.72e+04
KitchenQual           7754.6047   1530.538      5.067      0.000    4751.184    1.08e+04
MSZoning_RM          -1.374e+04   3022.703     -4.546      0.000   -1.97e+04   -7809.232
MasVnrArea            5176.2763   1152.127      4.493      0.000    2915.423    7437.130
Neighborhood_NridgHt  3.795e+04   4047.587      9.375      0.000       3e+04    4.59e+04
Neighborhood_Sawyer  -8722.9667   2562.981     -3.403      0.001   -1.38e+04   -3693.554
OverallCond           5515.7566   1029.063      5.360      0.000    3496.396    7535.117
OverallQual           1.345e+04   1825.122      7.368      0.000    9865.607     1.7e+04
TotalBsmtSF           5092.1573   1976.298      2.577      0.010    1214.009    8970.305
==============================================================================
Omnibus:                      576.902   Durbin-Watson:                   1.931
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13710.878
Skew:                           2.103   Prob(JB):                         0.00
Kurtosis:                      20.453   Cond. No.                         9.90
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.876
Model:                            OLS   Adj. R-squared:                  0.874
Method:                 Least Squares   F-statistic:                     418.5
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11928.
No. Observations:                1021   AIC:                         2.389e+04
Df Residuals:                    1003   BIC:                         2.398e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.838e+05   1334.695    137.678      0.000    1.81e+05    1.86e+05
1stFlrSF              6101.1879   1814.106      3.363      0.001    2541.310    9661.066
BldgType_Twnhs       -1.685e+04   3715.025     -4.534      0.000   -2.41e+04   -9555.004
BsmtFinSF1            9314.6296   1099.430      8.472      0.000    7157.182    1.15e+04
BsmtQual              5818.2517   1503.681      3.869      0.000    2867.530    8768.973
ExterQual             4700.2431   1558.801      3.015      0.003    1641.358    7759.128
Fireplaces            2512.9366   1093.392      2.298      0.022     367.339    4658.534
GarageArea            6649.8565   1223.740      5.434      0.000    4248.472    9051.241
GrLivArea            -3.791e+04   5540.112     -6.842      0.000   -4.88e+04    -2.7e+04
KitchenQual           7473.6554   1438.316      5.196      0.000    4651.202    1.03e+04
MSZoning_RM          -1.223e+04   2843.144     -4.303      0.000   -1.78e+04   -6654.673
MasVnrArea            4260.8861   1085.432      3.926      0.000    2130.908    6390.865
Neighborhood_NridgHt  3.592e+04   3807.159      9.436      0.000    2.85e+04    4.34e+04
Neighborhood_Sawyer  -7312.3464   2411.283     -3.033      0.002    -1.2e+04   -2580.609
OverallCond           5420.8369    966.954      5.606      0.000    3523.352    7318.322
OverallQual           1.274e+04   1715.990      7.425      0.000    9373.074    1.61e+04
TotalBsmtSF           5259.0805   1857.008      2.832      0.005    1615.014    8903.147
TotalFlrSFAbvGrd      6.544e+04   5649.153     11.584      0.000    5.44e+04    7.65e+04
==============================================================================
Omnibus:                      319.291   Durbin-Watson:                   1.923
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2726.205
Skew:                           1.184   Prob(JB):                         0.00
Kurtosis:                      10.647   Cond. No.                         21.1
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.860
Model:                            OLS   Adj. R-squared:                  0.858
Method:                 Least Squares   F-statistic:                     385.2
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11992.
No. Observations:                1021   AIC:                         2.402e+04
Df Residuals:                    1004   BIC:                         2.410e+04
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.841e+05   1420.133    129.634      0.000    1.81e+05    1.87e+05
1stFlrSF              7119.1079   1928.432      3.692      0.000    3334.890    1.09e+04
BldgType_Twnhs       -1.582e+04   3952.662     -4.002      0.000   -2.36e+04   -8061.543
BsmtFinSF1            1.009e+04   1167.899      8.643      0.000    7802.003    1.24e+04
BsmtQual              5886.5168   1600.310      3.678      0.000    2746.182    9026.852
ExterQual             4572.1359   1658.943      2.756      0.006    1316.743    7827.529
Fireplaces            3062.7983   1162.566      2.635      0.009     781.460    5344.137
GarageArea            6739.2785   1302.364      5.175      0.000    4183.612    9294.945
GrLivArea             2.447e+04   1388.612     17.619      0.000    2.17e+04    2.72e+04
KitchenQual           7754.6047   1530.538      5.067      0.000    4751.184    1.08e+04
MSZoning_RM          -1.374e+04   3022.703     -4.546      0.000   -1.97e+04   -7809.232
MasVnrArea            5176.2763   1152.127      4.493      0.000    2915.423    7437.130
Neighborhood_NridgHt  3.795e+04   4047.587      9.375      0.000       3e+04    4.59e+04
Neighborhood_Sawyer  -8722.9667   2562.981     -3.403      0.001   -1.38e+04   -3693.554
OverallCond           5515.7566   1029.063      5.360      0.000    3496.396    7535.117
OverallQual           1.345e+04   1825.122      7.368      0.000    9865.607     1.7e+04
TotalBsmtSF           5092.1573   1976.298      2.577      0.010    1214.009    8970.305
==============================================================================
Omnibus:                      576.902   Durbin-Watson:                   1.931
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            13710.878
Skew:                           2.103   Prob(JB):                         0.00
Kurtosis:                      20.453   Cond. No.                         9.90
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     365.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:15   Log-Likelihood:                -11988.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1003   BIC:                         2.410e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1424.129    128.951      0.000    1.81e+05    1.86e+05
1stFlrSF              6811.7268   1924.694      3.539      0.000    3034.839    1.06e+04
BldgType_Twnhs       -1.744e+04   3980.010     -4.383      0.000   -2.53e+04   -9633.225
BsmtFinSF1            1.003e+04   1164.004      8.618      0.000    7747.122    1.23e+04
BsmtQual              3883.5141   1743.243      2.228      0.026     462.692    7304.336
ExterQual             3842.6332   1672.891      2.297      0.022     559.865    7125.401
Fireplaces            3189.1619   1159.334      2.751      0.006     914.164    5464.160
GarageArea            5976.8080   1325.183      4.510      0.000    3376.360    8577.256
GrLivArea             2.569e+04   1448.901     17.729      0.000    2.28e+04    2.85e+04
KitchenQual           7079.8608   1543.499      4.587      0.000    4051.003    1.01e+04
MSZoning_RM          -1.034e+04   3241.011     -3.190      0.001   -1.67e+04   -3977.579
MasVnrArea            4865.2770   1153.275      4.219      0.000    2602.169    7128.385
Neighborhood_NridgHt  3.895e+04   4048.651      9.619      0.000     3.1e+04    4.69e+04
Neighborhood_Sawyer  -8873.0140   2554.522     -3.473      0.001   -1.39e+04   -3860.193
OverallCond           6499.6521   1082.215      6.006      0.000    4375.986    8623.318
OverallQual           1.265e+04   1840.137      6.875      0.000    9039.713    1.63e+04
TotalBsmtSF           5844.3070   1987.029      2.941      0.003    1945.097    9743.517
YearBuilt             4872.1360   1712.853      2.844      0.005    1510.950    8233.322
==============================================================================
Omnibus:                      600.381   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14855.225
Skew:                           2.214   Prob(JB):                         0.00
Kurtosis:                      21.154   Cond. No.                         10.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

 OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     344.9
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11988.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1002   BIC:                         2.411e+04
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1425.428    128.832      0.000    1.81e+05    1.86e+05
1stFlrSF              6803.6492   1931.440      3.523      0.000    3013.518    1.06e+04
BldgType_Twnhs       -1.745e+04   3983.702     -4.380      0.000   -2.53e+04   -9632.284
BsmtFinSF1            1.003e+04   1166.340      8.604      0.000    7745.994    1.23e+04
BsmtQual              3869.4578   1763.389      2.194      0.028     409.100    7329.816
ExterQual             3835.4991   1678.919      2.285      0.023     540.899    7130.099
Fireplaces            3195.4764   1165.778      2.741      0.006     907.830    5483.123
GarageArea            5978.2593   1326.114      4.508      0.000    3375.981    8580.538
GrLivArea             2.568e+04   1455.332     17.646      0.000    2.28e+04    2.85e+04
KitchenQual           7056.8286   1601.970      4.405      0.000    3913.228    1.02e+04
MSZoning_RM          -1.034e+04   3243.073     -3.188      0.001   -1.67e+04   -3976.445
MasVnrArea            4869.2821   1156.225      4.211      0.000    2600.382    7138.182
Neighborhood_NridgHt  3.895e+04   4053.827      9.609      0.000     3.1e+04    4.69e+04
Neighborhood_Sawyer  -8864.6489   2560.474     -3.462      0.001   -1.39e+04   -3840.144
OverallCond           6476.9715   1161.208      5.578      0.000    4198.294    8755.649
OverallQual           1.265e+04   1841.068      6.871      0.000    9037.465    1.63e+04
TotalBsmtSF           5853.9114   1995.941      2.933      0.003    1937.207    9770.615
YearBuilt             4835.9464   1839.837      2.628      0.009    1225.570    8446.323
YearRemodAdd            82.3336   1523.132      0.054      0.957   -2906.560    3071.227
==============================================================================
Omnibus:                      600.579   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14873.896
Skew:                           2.215   Prob(JB):                         0.00
Kurtosis:                      21.166   Cond. No.                         11.0
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
 Dropping YearRemodAdd and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     365.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11988.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1003   BIC:                         2.410e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1424.129    128.951      0.000    1.81e+05    1.86e+05
1stFlrSF              6811.7268   1924.694      3.539      0.000    3034.839    1.06e+04
BldgType_Twnhs       -1.744e+04   3980.010     -4.383      0.000   -2.53e+04   -9633.225
BsmtFinSF1            1.003e+04   1164.004      8.618      0.000    7747.122    1.23e+04
BsmtQual              3883.5141   1743.243      2.228      0.026     462.692    7304.336
ExterQual             3842.6332   1672.891      2.297      0.022     559.865    7125.401
Fireplaces            3189.1619   1159.334      2.751      0.006     914.164    5464.160
GarageArea            5976.8080   1325.183      4.510      0.000    3376.360    8577.256
GrLivArea             2.569e+04   1448.901     17.729      0.000    2.28e+04    2.85e+04
KitchenQual           7079.8608   1543.499      4.587      0.000    4051.003    1.01e+04
MSZoning_RM          -1.034e+04   3241.011     -3.190      0.001   -1.67e+04   -3977.579
MasVnrArea            4865.2770   1153.275      4.219      0.000    2602.169    7128.385
Neighborhood_NridgHt  3.895e+04   4048.651      9.619      0.000     3.1e+04    4.69e+04
Neighborhood_Sawyer  -8873.0140   2554.522     -3.473      0.001   -1.39e+04   -3860.193
OverallCond           6499.6521   1082.215      6.006      0.000    4375.986    8623.318
OverallQual           1.265e+04   1840.137      6.875      0.000    9039.713    1.63e+04
TotalBsmtSF           5844.3070   1987.029      2.941      0.003    1945.097    9743.517
YearBuilt             4872.1360   1712.853      2.844      0.005    1510.950    8233.322
==============================================================================
Omnibus:                      600.381   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14855.225
Skew:                           2.214   Prob(JB):                         0.00
Kurtosis:                      21.154   Cond. No.                         10.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

  OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     345.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11987.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1002   BIC:                         2.411e+04
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                  2.22e+05   2.81e+04      7.897      0.000    1.67e+05    2.77e+05
1stFlrSF              6867.7438   1924.304      3.569      0.000    3091.617    1.06e+04
BldgType_Twnhs       -1.759e+04   3979.838     -4.421      0.000   -2.54e+04   -9784.632
BsmtFinSF1            1.004e+04   1163.525      8.630      0.000    7757.741    1.23e+04
BsmtQual              3841.6039   1742.764      2.204      0.028     421.718    7261.490
ExterQual             3814.1383   1672.302      2.281      0.023     532.522    7095.754
Fireplaces            3207.9632   1158.917      2.768      0.006     933.780    5482.147
GarageArea            5972.0252   1324.617      4.508      0.000    3372.683    8571.367
GrLivArea             2.561e+04   1449.275     17.674      0.000    2.28e+04    2.85e+04
KitchenQual           7105.3660   1542.948      4.605      0.000    4077.585    1.01e+04
MSZoning_RM          -1.036e+04   3239.674     -3.199      0.001   -1.67e+04   -4006.334
MasVnrArea            4870.6232   1152.785      4.225      0.000    2608.473    7132.774
Neighborhood_NridgHt  3.915e+04   4049.555      9.667      0.000    3.12e+04    4.71e+04
Neighborhood_Sawyer  -8806.7138   2553.886     -3.448      0.001   -1.38e+04   -3795.136
OverallCond           6567.2728   1082.884      6.065      0.000    4442.293    8692.253
OverallQual           1.263e+04   1839.382      6.869      0.000    9025.504    1.62e+04
TotalBsmtSF           5780.2080   1986.729      2.909      0.004    1881.581    9678.835
YearBuilt             4904.6437   1712.282      2.864      0.004    1544.574    8264.713
dateSold             -3.174e-05   2.33e-05     -1.365      0.173   -7.74e-05    1.39e-05
==============================================================================
Omnibus:                      598.524   Durbin-Watson:                   1.939
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14660.003
Skew:                           2.208   Prob(JB):                         0.00
Kurtosis:                      21.031   Cond. No.                     3.54e+10
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.54e+10. This might indicate that there are
strong multicollinearity or other numerical problems.
 Dropping dateSold and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     365.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11988.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1003   BIC:                         2.410e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1424.129    128.951      0.000    1.81e+05    1.86e+05
1stFlrSF              6811.7268   1924.694      3.539      0.000    3034.839    1.06e+04
BldgType_Twnhs       -1.744e+04   3980.010     -4.383      0.000   -2.53e+04   -9633.225
BsmtFinSF1            1.003e+04   1164.004      8.618      0.000    7747.122    1.23e+04
BsmtQual              3883.5141   1743.243      2.228      0.026     462.692    7304.336
ExterQual             3842.6332   1672.891      2.297      0.022     559.865    7125.401
Fireplaces            3189.1619   1159.334      2.751      0.006     914.164    5464.160
GarageArea            5976.8080   1325.183      4.510      0.000    3376.360    8577.256
GrLivArea             2.569e+04   1448.901     17.729      0.000    2.28e+04    2.85e+04
KitchenQual           7079.8608   1543.499      4.587      0.000    4051.003    1.01e+04
MSZoning_RM          -1.034e+04   3241.011     -3.190      0.001   -1.67e+04   -3977.579
MasVnrArea            4865.2770   1153.275      4.219      0.000    2602.169    7128.385
Neighborhood_NridgHt  3.895e+04   4048.651      9.619      0.000     3.1e+04    4.69e+04
Neighborhood_Sawyer  -8873.0140   2554.522     -3.473      0.001   -1.39e+04   -3860.193
OverallCond           6499.6521   1082.215      6.006      0.000    4375.986    8623.318
OverallQual           1.265e+04   1840.137      6.875      0.000    9039.713    1.63e+04
TotalBsmtSF           5844.3070   1987.029      2.941      0.003    1945.097    9743.517
YearBuilt             4872.1360   1712.853      2.844      0.005    1510.950    8233.322
==============================================================================
Omnibus:                      600.381   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14855.225
Skew:                           2.214   Prob(JB):                         0.00
Kurtosis:                      21.154   Cond. No.                         10.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

   OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     345.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11987.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1002   BIC:                         2.411e+04
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                  2.22e+05   2.81e+04      7.897      0.000    1.67e+05    2.77e+05
1stFlrSF              6867.7438   1924.304      3.569      0.000    3091.617    1.06e+04
BldgType_Twnhs       -1.759e+04   3979.838     -4.421      0.000   -2.54e+04   -9784.632
BsmtFinSF1            1.004e+04   1163.525      8.630      0.000    7757.741    1.23e+04
BsmtQual              3841.6039   1742.764      2.204      0.028     421.718    7261.490
ExterQual             3814.1383   1672.302      2.281      0.023     532.522    7095.754
Fireplaces            3207.9632   1158.917      2.768      0.006     933.780    5482.147
GarageArea            5972.0252   1324.617      4.508      0.000    3372.683    8571.367
GrLivArea             2.561e+04   1449.275     17.674      0.000    2.28e+04    2.85e+04
KitchenQual           7105.3660   1542.948      4.605      0.000    4077.585    1.01e+04
MSZoning_RM          -1.036e+04   3239.674     -3.199      0.001   -1.67e+04   -4006.334
MasVnrArea            4870.6232   1152.785      4.225      0.000    2608.473    7132.774
Neighborhood_NridgHt  3.915e+04   4049.555      9.667      0.000    3.12e+04    4.71e+04
Neighborhood_Sawyer  -8806.7138   2553.886     -3.448      0.001   -1.38e+04   -3795.136
OverallCond           6567.2728   1082.884      6.065      0.000    4442.293    8692.253
OverallQual           1.263e+04   1839.382      6.869      0.000    9025.504    1.62e+04
TotalBsmtSF           5780.2080   1986.729      2.909      0.004    1881.581    9678.835
YearBuilt             4904.6437   1712.282      2.864      0.004    1544.574    8264.713
dateSold             -3.174e-05   2.33e-05     -1.365      0.173   -7.74e-05    1.39e-05
==============================================================================
Omnibus:                      598.524   Durbin-Watson:                   1.939
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14660.003
Skew:                           2.208   Prob(JB):                         0.00
Kurtosis:                      21.031   Cond. No.                     3.54e+10
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.54e+10. This might indicate that there are
strong multicollinearity or other numerical problems.
 Dropping dateSold and rebuilding the model as it did not add any info to model 
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              SalePrice   R-squared:                       0.861
Model:                            OLS   Adj. R-squared:                  0.859
Method:                 Least Squares   F-statistic:                     365.6
Date:                Mon, 27 Jun 2022   Prob (F-statistic):               0.00
Time:                        19:38:16   Log-Likelihood:                -11988.
No. Observations:                1021   AIC:                         2.401e+04
Df Residuals:                    1003   BIC:                         2.410e+04
Df Model:                          17                                         
Covariance Type:            nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                 1.836e+05   1424.129    128.951      0.000    1.81e+05    1.86e+05
1stFlrSF              6811.7268   1924.694      3.539      0.000    3034.839    1.06e+04
BldgType_Twnhs       -1.744e+04   3980.010     -4.383      0.000   -2.53e+04   -9633.225
BsmtFinSF1            1.003e+04   1164.004      8.618      0.000    7747.122    1.23e+04
BsmtQual              3883.5141   1743.243      2.228      0.026     462.692    7304.336
ExterQual             3842.6332   1672.891      2.297      0.022     559.865    7125.401
Fireplaces            3189.1619   1159.334      2.751      0.006     914.164    5464.160
GarageArea            5976.8080   1325.183      4.510      0.000    3376.360    8577.256
GrLivArea             2.569e+04   1448.901     17.729      0.000    2.28e+04    2.85e+04
KitchenQual           7079.8608   1543.499      4.587      0.000    4051.003    1.01e+04
MSZoning_RM          -1.034e+04   3241.011     -3.190      0.001   -1.67e+04   -3977.579
MasVnrArea            4865.2770   1153.275      4.219      0.000    2602.169    7128.385
Neighborhood_NridgHt  3.895e+04   4048.651      9.619      0.000     3.1e+04    4.69e+04
Neighborhood_Sawyer  -8873.0140   2554.522     -3.473      0.001   -1.39e+04   -3860.193
OverallCond           6499.6521   1082.215      6.006      0.000    4375.986    8623.318
OverallQual           1.265e+04   1840.137      6.875      0.000    9039.713    1.63e+04
TotalBsmtSF           5844.3070   1987.029      2.941      0.003    1945.097    9743.517
YearBuilt             4872.1360   1712.853      2.844      0.005    1510.950    8233.322
==============================================================================
Omnibus:                      600.381   Durbin-Watson:                   1.935
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            14855.225
Skew:                           2.214   Prob(JB):                         0.00
Kurtosis:                      21.154   Cond. No.                         10.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

image.png

image.png

image.png

从上面得出的结果可以看到,经过我们一系列的特征选择,我们的测试集额R2值要高于训练集,说明我们成功的解决了overfitting问题

结论

  • 本文主要介绍了如何使用数值分析工具进行特征的分析和特征的处理,并且demo和线性回归。
  • 基于本文,还可以进行进一步的特征工程以提高模型的鲁棒性,限于篇幅,这里不再展开
相关实践学习
使用PAI-EAS一键部署ChatGLM及LangChain应用
本场景中主要介绍如何使用模型在线服务(PAI-EAS)部署ChatGLM的AI-Web应用以及启动WebUI进行模型推理,并通过LangChain集成自己的业务数据。
机器学习概览及常见算法
机器学习(Machine Learning, ML)是人工智能的核心,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能,它是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。 本课程将带你入门机器学习,掌握机器学习的概念和常用的算法。
相关文章
|
7月前
|
数据可视化 数据挖掘
R语言生存分析数据分析可视化案例(下)
R语言生存分析数据分析可视化案例
|
4月前
|
数据采集 存储 数据挖掘
【优秀python数据分析案例】基于Python书旗网小说网站数据采集与分析的设计与实现
本文介绍了一个基于Python的书旗网小说网站数据采集与分析系统,通过自动化爬虫收集小说数据,利用Pandas进行数据处理,并通过Matplotlib和Seaborn等库进行数据可视化,旨在揭示用户喜好和市场趋势,为图书出版行业提供决策支持。
335 6
【优秀python数据分析案例】基于Python书旗网小说网站数据采集与分析的设计与实现
|
4月前
|
数据采集 数据可视化 关系型数据库
【优秀python 数据分析案例】基于python的穷游网酒店数据采集与可视化分析的设计与实现
本文介绍了一个基于Python的穷游网酒店数据采集与可视化分析系统,通过爬虫技术自动抓取酒店信息,并利用数据分析算法和可视化工具,提供了全国主要城市酒店的数量、星级、价格、评分等多维度的深入洞察,旨在为旅行者和酒店经营者提供决策支持。
115 4
【优秀python 数据分析案例】基于python的穷游网酒店数据采集与可视化分析的设计与实现
|
4月前
|
JSON 数据挖掘 API
案例 | 用pdpipe搭建pandas数据分析流水线
案例 | 用pdpipe搭建pandas数据分析流水线
|
4月前
|
机器学习/深度学习 前端开发 数据挖掘
基于Python Django的房价数据分析平台,包括大屏和后台数据管理,有线性、向量机、梯度提升树、bp神经网络等模型
本文介绍了一个基于Python Django框架开发的房价数据分析平台,该平台集成了多种机器学习模型,包括线性回归、SVM、GBDT和BP神经网络,用于房价预测和市场分析,同时提供了前端大屏展示和后台数据管理功能。
110 9
|
4月前
|
数据采集 数据可视化 数据挖掘
【优秀python案例】基于python爬虫的深圳房价数据分析与可视化实现
本文通过Python爬虫技术从链家网站爬取深圳二手房房价数据,并进行数据清洗、分析和可视化,提供了房价走势、区域房价比较及房屋特征等信息,旨在帮助购房者更清晰地了解市场并做出明智决策。
151 2
|
4月前
|
数据采集 存储 数据可视化
【优秀python数据分析案例】基于python的中国天气网数据采集与可视化分析的设计与实现
本文介绍了一个基于Python的中国天气网数据采集与可视化分析系统,通过requests和BeautifulSoup库实现数据爬取,利用matplotlib、numpy和pandas进行数据可视化,提供了温湿度变化曲线、空气质量图、风向雷达图等分析结果,有效预测和展示了未来天气信息。
952 2
|
4月前
|
数据采集 数据可视化 算法
基于Python flask的boss直聘数据分析与可视化系统案例,能预测boss直聘某个岗位某个城市的薪资
本文介绍了一个基于Python Flask框架的Boss直聘数据分析与可视化系统,系统使用selenium爬虫、MySQL和csv进行数据存储,通过Pandas和Numpy进行数据处理分析,并采用模糊匹配算法进行薪资预测。
102 0
基于Python flask的boss直聘数据分析与可视化系统案例,能预测boss直聘某个岗位某个城市的薪资
|
5月前
|
数据采集 机器学习/深度学习 数据可视化
完整的Python数据分析流程案例解析-数据科学项目实战
【7月更文挑战第5天】这是一个Python数据分析项目的概览,涵盖了从CSV数据加载到模型评估的步骤:获取数据、预处理(处理缺失值和异常值、转换数据)、数据探索(可视化和统计分析)、模型选择(线性回归)、训练与评估、优化,以及结果的可视化和解释。此流程展示了理论与实践的结合在解决实际问题中的应用。
118 1
|
6月前
|
数据采集 机器学习/深度学习 数据可视化
数据挖掘实战:Python在金融数据分析中的应用案例
Python在金融数据分析中扮演关键角色,用于预测市场趋势和风险管理。本文通过案例展示了使用Python库(如pandas、numpy、matplotlib等)进行数据获取、清洗、分析和建立预测模型,例如计算苹果公司(AAPL)股票的简单移动平均线,以展示基本流程。此示例为更复杂的金融建模奠定了基础。【6月更文挑战第13天】
1534 3