4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()
Question: Build your neural network model in nn_model()
.
Instructions: The neural network model has to use the previous functions in the right order.
# GRADED FUNCTION: nn_model def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False): """ Arguments: X -- dataset of shape (2, number of examples) Y -- labels of shape (1, number of examples) n_h -- size of the hidden layer num_iterations -- Number of iterations in gradient descent loop print_cost -- if True, print the cost every 1000 iterations Returns: parameters -- parameters learnt by the model. They can then be used to predict. """ np.random.seed(3) n_x = layer_sizes(X, Y)[0] n_y = layer_sizes(X, Y)[2] # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters". ### START CODE HERE ### (≈ 5 lines of code) parameters = initialize_parameters(n_x,n_h,n_y) W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] ### END CODE HERE ### # Loop (gradient descent) for i in range(0, num_iterations): ### START CODE HERE ### (≈ 4 lines of code) # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache". A2, cache =forward_propagation(X, parameters) # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost". cost = compute_cost(A2, Y, parameters) # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads". grads = backward_propagation(parameters, cache, X, Y) # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters". parameters = update_parameters(parameters, grads) ### END CODE HERE ### # Print the cost every 1000 iterations if print_cost and i % 1000 == 0: print ("Cost after iteration %i: %f" %(i, cost)) return parameters
X_assess, Y_assess = nn_model_test_case() parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=False) print("W1 = " + str(parameters["W1"])) print("b1 = " + str(parameters["b1"])) print("W2 = " + str(parameters["W2"])) print("b2 = " + str(parameters["b2"]))
/opt/conda/lib/python3.5/site-packages/ipykernel/__main__.py:20: RuntimeWarning: divide by zero encountered in log /home/jovyan/work/Week 3/Planar data classification with one hidden layer/planar_utils.py:34: RuntimeWarning: overflow encountered in exp s = 1/(1+np.exp(-x)) W1 = [[-4.18494056 5.33220609] [-7.52989382 1.24306181] [-4.1929459 5.32632331] [ 7.52983719 -1.24309422]] b1 = [[ 2.32926819] [ 3.79458998] [ 2.33002577] [-3.79468846]] W2 = [[-6033.83672146 -6008.12980822 -6033.10095287 6008.06637269]] b2 = [[-52.66607724]]
Expected Output:
W1 | [[-4.18494056 5.33220609] [-7.52989382 1.24306181] [-4.1929459 5.32632331] [ 7.52983719 -1.24309422]] |
|
b1 | [[ 2.32926819] [ 3.79458998] [ 2.33002577] [-3.79468846]] |
|
W2 | [[-6033.83672146 -6008.12980822 -6033.10095287 6008.06637269]] | |
b2 | [[-52.66607724]] |
4.5 Predictions
Question: Use your model to predict by building predict().
Use forward propagation to predict results.
Reminder: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
1 & \text{if}\ activation > 0.5 \
0 & \text{otherwise}
\end{cases}$
As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: X_new = (X > threshold)
# GRADED FUNCTION: predict def predict(parameters, X): """ Using the learned parameters, predicts a class for each example in X Arguments: parameters -- python dictionary containing your parameters X -- input data of size (n_x, m) Returns predictions -- vector of predictions of our model (red: 0 / blue: 1) """ # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold. ### START CODE HERE ### (≈ 2 lines of code) A2, cache = forward_propagation(X, parameters) predictions = (A2>0.5) ### END CODE HERE ### return predictions
parameters, X_assess = predict_test_case() predictions = predict(parameters, X_assess) print("predictions mean = " + str(np.mean(predictions)))
predictions mean = 0.666666666667
Expected Output:
predictions mean | 0.666666666667 |
It is time to run the model and see how it performs on a planar dataset. Run the following code to test your model with a single hidden layer of $n_h$ hidden units.
# Build a model with a n_h-dimensional hidden layer parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True) # Plot the decision boundary plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y) plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048 Cost after iteration 1000: 0.288083 Cost after iteration 2000: 0.254385 Cost after iteration 3000: 0.233864 Cost after iteration 4000: 0.226792 Cost after iteration 5000: 0.222644 Cost after iteration 6000: 0.219731 Cost after iteration 7000: 0.217504 Cost after iteration 8000: 0.219454 Cost after iteration 9000: 0.218607 <matplotlib.text.Text at 0x7f1b55d40b38>
output_50_2.png
Expected Output:
Cost after iteration 9000 | 0.218607 |
# Print accuracy predictions = predict(parameters, X) print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
Accuracy: 90%
Expected Output:
Accuracy | 90% |
Accuracy is really high compared to Logistic Regression. The model has learnt the leaf patterns of the flower! Neural networks are able to learn even highly non-linear decision boundaries, unlike logistic regression.
Now, let's try out several hidden layer sizes.
4.6 - Tuning hidden layer size (optional/ungraded exercise)
Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.
# This may take about 2 minutes to run plt.figure(figsize=(16, 32)) hidden_layer_sizes = [1, 2, 3, 4, 5, 20, 50] for i, n_h in enumerate(hidden_layer_sizes): plt.subplot(5, 2, i+1) plt.title('Hidden Layer of size %d' % n_h) parameters = nn_model(X, Y, n_h, num_iterations = 5000) plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y) predictions = predict(parameters, X) accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))
Accuracy for 1 hidden units: 67.5 % Accuracy for 2 hidden units: 67.25 % Accuracy for 3 hidden units: 90.75 % Accuracy for 4 hidden units: 90.5 % Accuracy for 5 hidden units: 91.25 % Accuracy for 20 hidden units: 90.0 % Accuracy for 50 hidden units: 90.25 %
output_56_1.png
Interpretation:
- The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data.
- The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticable overfitting.
- You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting.
Optional questions:
Note: Remember to submit the assignment but clicking the blue "Submit Assignment" button at the upper-right.
Some optional/ungraded questions that you can explore if you wish:
- What happens when you change the tanh activation for a sigmoid activation or a ReLU activation?
- Play with the learning_rate. What happens?
- What if we change the dataset? (See part 5 below!)
You've learnt to:
- Build a complete neural network with a hidden layer
- Make a good use of a non-linear unit
- Implemented forward propagation and backpropagation, and trained a neural network
- See the impact of varying the hidden layer size, including overfitting.
Nice work!
5) Performance on other datasets
If you want, you can rerun the whole notebook (minus the dataset part) for each of the following datasets.
# Datasets noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets() datasets = {"noisy_circles": noisy_circles, "noisy_moons": noisy_moons, "blobs": blobs, "gaussian_quantiles": gaussian_quantiles} ### START CODE HERE ### (choose your dataset) dataset = "noisy_moons" ### END CODE HERE ### X, Y = datasets[dataset] X, Y = X.T, Y.reshape(1, Y.shape[0]) # make blobs binary if dataset == "blobs": Y = Y%2 # Visualize the data plt.scatter(X[0, :], X[1, :], c=Y, s=40, cmap=plt.cm.Spectral);
output_63_0.png
Congrats on finishing this Programming Assignment!
Reference: