第三周编程作业-Planar data classification with one hidden layer(二)

简介: 第三周编程作业-Planar data classification with one hidden layer(二)

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()


Question: Build your neural network model in nn_model().

Instructions: The neural network model has to use the previous functions in the right order.


# GRADED FUNCTION: nn_model
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = initialize_parameters(n_x,n_h,n_y)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    ### END CODE HERE ###
    # Loop (gradient descent)
    for i in range(0, num_iterations):
        ### START CODE HERE ### (≈ 4 lines of code)
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache =forward_propagation(X, parameters)
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y, parameters)
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters, cache, X, Y)
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters, grads)
        ### END CODE HERE ###
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    return parameters


X_assess, Y_assess = nn_model_test_case()
parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=False)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))


/opt/conda/lib/python3.5/site-packages/ipykernel/__main__.py:20: RuntimeWarning: divide by zero encountered in log
/home/jovyan/work/Week 3/Planar data classification with one hidden layer/planar_utils.py:34: RuntimeWarning: overflow encountered in exp
  s = 1/(1+np.exp(-x))
W1 = [[-4.18494056  5.33220609]
 [-7.52989382  1.24306181]
 [-4.1929459   5.32632331]
 [ 7.52983719 -1.24309422]]
b1 = [[ 2.32926819]
 [ 3.79458998]
 [ 2.33002577]
 [-3.79468846]]
W2 = [[-6033.83672146 -6008.12980822 -6033.10095287  6008.06637269]]
b2 = [[-52.66607724]]


Expected Output:

W1 [[-4.18494056 5.33220609]

[-7.52989382 1.24306181]

[-4.1929459 5.32632331]

[ 7.52983719 -1.24309422]]

b1 [[ 2.32926819]

[ 3.79458998]

[ 2.33002577]

[-3.79468846]]

W2 [[-6033.83672146 -6008.12980822 -6033.10095287 6008.06637269]]
b2

[[-52.66607724]]


4.5 Predictions


Question: Use your model to predict by building predict().

Use forward propagation to predict results.


Reminder: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}

1 & \text{if}\ activation > 0.5 \

0 & \text{otherwise}

\end{cases}$

As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: X_new = (X > threshold)


# GRADED FUNCTION: predict
def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    A2, cache = forward_propagation(X, parameters)
    predictions = (A2>0.5)
    ### END CODE HERE ###
    return predictions


parameters, X_assess = predict_test_case()
predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))


predictions mean = 0.666666666667


Expected Output:

predictions mean 0.666666666667

It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of $n_h$ hidden units.


# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))


Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219454
Cost after iteration 9000: 0.218607
<matplotlib.text.Text at 0x7f1b55d40b38>


3.png

output_50_2.png

Expected Output:

Cost after iteration 9000 0.218607


# Print accuracy
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')


Accuracy: 90%


Expected Output:

Accuracy 90%

Accuracy is really high compared to Logistic Regression. The model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression.

Now, let's try out several hidden layer sizes.


4.6 - Tuning hidden layer size (optional/ungraded exercise)


Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.


# This may take about 2 minutes to run
plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 20, 50]
for i, n_h in enumerate(hidden_layer_sizes):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer of size %d' % n_h)
    parameters = nn_model(X, Y, n_h, num_iterations = 5000)
    plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
    predictions = predict(parameters, X)
    accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))


Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 20 hidden units: 90.0 %
Accuracy for 50 hidden units: 90.25 %


4.png


output_56_1.png

Interpretation:

  • The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data.
  • The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to  fits the data well without also incurring noticable overfitting.
  • You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting.


Optional questions:

Note: Remember to submit the assignment but clicking the blue "Submit Assignment" button at the upper-right.

Some optional/ungraded questions that you can explore if you wish:

  • What happens when you change the tanh activation for a sigmoid activation or a ReLU activation?
  • Play with the learning_rate. What happens?
  • What if we change the dataset? (See part 5 below!)

You've learnt to:

  • Build a complete neural network with a hidden layer
  • Make a good use of a non-linear unit
  • Implemented forward propagation and backpropagation, and trained a neural network
  • See the impact of varying the hidden layer size, including overfitting.

Nice work!


5) Performance on other datasets


If you want, you can rerun the whole notebook (minus the dataset part) for each of the following datasets.


# Datasets
noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets()
datasets = {"noisy_circles": noisy_circles,
            "noisy_moons": noisy_moons,
            "blobs": blobs,
            "gaussian_quantiles": gaussian_quantiles}
### START CODE HERE ### (choose your dataset)
dataset = "noisy_moons"
### END CODE HERE ###
X, Y = datasets[dataset]
X, Y = X.T, Y.reshape(1, Y.shape[0])
# make blobs binary
if dataset == "blobs":
    Y = Y%2
# Visualize the data
plt.scatter(X[0, :], X[1, :], c=Y, s=40, cmap=plt.cm.Spectral);


5.png

output_63_0.png

Congrats on finishing this Programming Assignment!

Reference:

相关文章
|
数据采集 自然语言处理 数据可视化
Hidden Markov Model,简称 HMM
隐马尔可夫模型(Hidden Markov Model,简称 HMM)是一种统计模型,用于描述由隐藏的马尔可夫链随机生成观测序列的过程。它是一种生成模型,可以通过学习模型参数来预测观测序列的未来状态。HMM 主要包括以下几个步骤:
143 5
|
6月前
|
机器学习/深度学习 算法
【文献学习】Meta-Learning to Communicate: Fast End-to-End Training for Fading Channels
把学习如何在衰落的噪声信道上进行通信的过程公式化为对自动编码器的无监督训练。该自动编码器由编码器,信道和解码器的级联组成。
52 2
|
9月前
|
机器学习/深度学习 算法 定位技术
神经网络epoch、batch、batch size、step与iteration的具体含义介绍
神经网络epoch、batch、batch size、step与iteration的具体含义介绍
466 1
|
机器学习/深度学习 数据挖掘
【论文解读】Co-attention network with label embedding for text classification
华南理工出了一篇有意思的文章,将标签和文本进行深度融合,最终形成带标签信息的文本表示和带文本信息的标签表示。
289 1
2022亚太建模B题Optimal Design of High-speed Train思路分析
2022亚太建模B题思路分析高速列车的优化设计 Optimal Design of High-speed Train
2022亚太建模B题Optimal Design of High-speed Train思路分析
|
PyTorch 算法框架/工具
Pytorch疑难小实验:Torch.max() Torch.min()在不同维度上的解释
Pytorch疑难小实验:Torch.max() Torch.min()在不同维度上的解释
181 0
|
机器学习/深度学习 算法 PyTorch
【菜菜的CV进阶之路-Pytorch基础-model.eval】同一个模型测试:shuffle=False和shuffle=True 结果差异很大
【菜菜的CV进阶之路-Pytorch基础-model.eval】同一个模型测试:shuffle=False和shuffle=True 结果差异很大
302 0
【菜菜的CV进阶之路-Pytorch基础-model.eval】同一个模型测试:shuffle=False和shuffle=True 结果差异很大
|
Shell 计算机视觉
2022亚太建模A题Feature Extraction of Sequence Images and Modeling Analysis of Mold Flux Melting and Crystallization思路分析
2022 亚太建模A题序列图像的特征提取与建模分析 模具流量的熔融和结晶Feature Extraction of Sequence Images and Modeling Analysis of Mold Flux Melting and Crystallization
2022亚太建模A题Feature Extraction of Sequence Images and Modeling Analysis of Mold Flux Melting and Crystallization思路分析
|
机器学习/深度学习 算法 数据挖掘
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift