使用Dropout抑制过拟合
Dropout是一种常用的神经网络正则化方法,主要用于防止过拟合。在深度学习中,由于网络层数过多,参数数量庞大,模型容易过拟合,并且在测试时产生较大的泛化误差。Dropout方法借鉴了集成学习中的Bagging思想,通过随机的方式,将一部分神经元的输出设置为0,从而减少过拟合的可能。
Dropout方法最早由Hinton等人提出,其基本思想是在训练时,以一定的概率随机地将网络中某些神经元的输出置为0。这种随机的行为可以被看作是一种对网络进行了部分剪枝,从而增加了网络的容忍性,使网络更加健壮,同时也避免了网络中某些特定的神经元对整个网络的过度依赖。
Dropout方法的具体实现如下:在每次训练过程中,以一定的概率p随机选择一部分神经元并将其置为0,被选择的神经元不参与后续的训练和反向传播。在测试时,为了保持模型的稳定性和一致性,一般不会采取随机化的方式,而是将每个神经元的权重乘以概率p,这里的p是在训练时选择的那个概率。
Dropout方法不仅可用于全连接网络,也可用于卷积神经网络和循环神经网络中,以减少过拟合现象。并且,它的实现简单,仅需要在模型训练时对每个神经元以概率p随机地进行挑选和保留,所以Dropout方法得到了广泛的应用和推广。
总之,Dropout方法可以在一定程度上提高模型的准确性和泛化能力,对于防止过拟合有着较好的效果。但是需要注意的是,Dropout方法会导致训练过程中每个mini-batch的梯度都不同,所以在使用Dropout方法时需要调整学习率,以保证模型的收敛速度和效果。
1. 环境准备
# 导入库
import keras
from keras import layers
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
Using TensorFlow backend.
2. 导入数据集
# 导入数据集
data = pd.read_csv('./dataset/credit-a.csv', header=None)
data
653 rows × 16 columns
data.iloc[:, -1].unique()
3. 对所有数据的预测
3.1 数据集
x = data.iloc[:, :-1].values
y = data.iloc[:, -1].replace(-1, 0).values.reshape(-1, 1)
3.2 构建神经网络
model = keras.Sequential()
model.add(layers.Dense(128, input_dim=15, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 128) 2048
_________________________________________________________________
dense_2 (Dense) (None, 128) 16512
_________________________________________________________________
dense_3 (Dense) (None, 128) 16512
_________________________________________________________________
dense_4 (Dense) (None, 1) 129
=================================================================
Total params: 35,201
Trainable params: 35,201
Non-trainable params: 0
_________________________________________________________________
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc']
)
WARNING:tensorflow:From /home/nlp/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
3.3 训练模型
history = model.fit(x, y, epochs=1000)
WARNING:tensorflow:From /home/nlp/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.
Epoch 1/1000
653/653 [==============================] - 0s 434us/step - loss: 7.5273 - acc: 0.5988
Epoch 2/1000
653/653 [==============================] - 0s 92us/step - loss: 3.7401 - acc: 0.6187
Epoch 3/1000
653/653 [==============================] - 0s 75us/step - loss: 3.6464 - acc: 0.5712
Epoch 4/1000
653/653 [==============================] - 0s 56us/step - loss: 10.2291 - acc: 0.6631
Epoch 5/1000
653/653 [==============================] - 0s 63us/step - loss: 2.0400 - acc: 0.6233
Epoch 6/1000
653/653 [==============================] - 0s 120us/step - loss: 2.4279 - acc: 0.6217
Epoch 7/1000
653/653 [==============================] - 0s 105us/step - loss: 2.3289 - acc: 0.6325
Epoch 8/1000
653/653 [==============================] - 0s 159us/step - loss: 3.2521 - acc: 0.6294
Epoch 9/1000
653/653 [==============================] - 0s 89us/step - loss: 2.6005 - acc: 0.6294
Epoch 10/1000
653/653 [==============================] - 0s 118us/step - loss: 1.3997 - acc: 0.6738
……
Epoch 1000/1000
653/653 [==============================] - 0s 106us/step - loss: 0.2630 - acc: 0.9326
3.4 分析模型
dict_keys(['loss', 'acc'])
plt.plot(history.epoch, history.history.get('loss'), c='r')
plt.plot(history.epoch, history.history.get('acc'), c='b')
[<matplotlib.lines.Line2D at 0x7fd43c1597f0>]
4. 对未见过数据的预测
4.1 划分数据集
x_train = x[:int(len(x)*0.75)]
x_test = x[int(len(x)*0.75):]
y_train = y[:int(len(x)*0.75)]
y_test = y[int(len(x)*0.75):]
x_train.shape, x_test.shape, y_train.shape, y_test.shape
((489, 15), (164, 15), (489, 1), (164, 1))
4.2 构建神经网络
model = keras.Sequential()
model.add(layers.Dense(128, input_dim=15, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
#admam:利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率.
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc']
)