Before
This homework is intended to give you an introduction to building, training, and testing neural network models. You will not only be exposed to using Python packages to build a neural network from scratch, but also the mathematical aspects of backpropagation and gradient descent. While in practical scenarios, you won’t necessarily have to implement neural networks from scratch (as you will see in future labs and assignments), this assignment aims at giving you a rudimentary idea of what goes on under the hood in packages such as TensorFlow and Keras. In this assignment, you will use the MNIST handwritten digits dataset to train a simple classification neural network using batch learning and evaluate your model.
中文(Chinese): http://t.csdn.cn/2AyUX
Link to this article .pdf file
---------------------------------------------------------------------------------------------------------------------------------
password:v6zc
https://pan.baidu.com/s/1_qFbN0Nhc8MNYCHEyTbkLg%C2%A0
Provide finished code files:
Get Files:
link:https://pan.baidu.com/s/1Fw_7thL5PxR79zI6XbpnYQ
password:txqe
Files Structrue:
| - hw2 | - code | - Beras | - eight files extension is .py | - assignment.py | - preprocess.py | - visualize.py | - data | - mnist | - four files of minst dataset
Conceptual Questions
Please submit the Conceptuals Questions on Gradescope under hw2-mlp conceptual .
You must type your submissions and upload a PDF. We recommend using LaTeX.
Getting the Stencil
Github Classroom Link with Stencil Code!
Guide on GitHub and GitHub Classroom
Getting the Data
You can use download.sh for downloading the data. You can run a bash script with the command ./script_name.sh (ex: bash ./download.sh ). This is similar to HW1. Setup Work off of the stencil code provided, but do not change the stencil except where specified.
Doing so may cause incompatibility with the autograder. Don’t change any method signatures!
This assignment requires NumPy and Matplotlib. You should already have this from HW1. Also check the virtual environment guide to set up TensorFlow 2.5 on your local machine. Feel free to reference this guide if you have to work through colab.
Assignment Overview
In this assignment, you will be constructing a Keras mimic, Beras (haha funny name), and will make a sequential model specification that mimics the Tensorflow/Keras API. The Python notebook associated with this homework is meant for you to explore an example implementation so that you can build on it yourself! There are no TODOs for you to work on in the notebook ;
rather, the testing is done by running the main method of assignment.py .
Our stencil provides a model class with several methods and hyperparameters you need to use for your network. You will also answer conceptual questions related to the assignment and class material (don’t forget to answer the 2470-only questions if you’re a 2470 student!). You should include a brief README with your model's accuracy and any known bugs.
Before Starting
This homework assignment is due two weeks from release. Labs 1-3 provide great practice for this assignment, so you can wait a little for them if you get stuck. Specifically:
-
Implementing Callable/Diffable Components : The skills you need in order to do this
can be found by working through Lab 1 . This includes comfort with mathematical
notation, matrix operations, and the logic behind call and gradient methods.
-
Implementing Optimizers: You can implement the BasicOptimizer class just by
following the logic from the gradient_descent method in lab 1. The other optimizers
(i.e. Adam, RMSProp) will be covered in Lab 2: Optimizers .
-
Using batch_step and GradientTape: You can figure out how to use these to train your
model based on the assignment instructions and your implementations of these. With
that said, they do mimic the Keras API. You’ll learn about all this in Lab 3: Intro to
Tensorflow . If your lab is after the due date, it should be fine; just skim over the
complementary notebook associated with the lab.
Feel free to start off by doing what you can and then add onto it as you learn more about deep learning and realize that the same concepts you learn in the class can actually be used here!
Do not get discouraged, and try to have fun! Roadmap
For this assignment, we'll walk you through the pipeline of training a neural net, including the structure of the model class and the methods you will have to fill in.
1. Preprocessing the Data
Note: Code for preprocessing should be pulled in from HW1.
Before training a network, you will need to clean your data. This includes retrieving,
altering, and formatting the data into the inputs for your network. For this assignment,you will be working on the MNIST dataset. It can be downloaded through the download.sh script, but it’s also linked here (ignore that it says hw1; we’re using this dataset for hw2 this time!). The original data source is here .
You should train your network using only the training data and then test your network's accuracy on the testing data. Your program should print its accuracy over the test dataset upon completion.
2. One Hot Encoding
Before training or testing your model, you will need to “one-hot” encode your class labels so that the model can optimize towards predicting any desired class. Note that the class labels by themselves are simply categories and do not mean anything numerically. In the absence of one-hot encoding, your model might learn some natural ordering between the different class labels based on the labels (which are arbitrary).
For example, let’s say there’s a data point A which corresponds to label ‘2’ and a data point B which corresponds to label ‘7’. We don’t want the model to somehow learn that B has a higher weightage than A simply because, numerically speaking, 7 > 2.
To one-hot encode your class labels, you will
have to convert your 1-dimensional label vector
into a vector of size num_classes (where
num_classes is the total number of classes in
your dataset). For the MNIST dataset, it looks
something like the matrix on the right:
You have to fill out the following method in Beras/onehot.py
● fit() : [TODO] In this function you need to fetch all the unique labels in the
data (store this in self.uniq ) and create a dictionary with labels as the keys
and their corresponding one hot encodings as values. Hint: You might want to
look at np.eye() to get the one-hot encodings. Ultimately, you will store this
dictionary in self.uniq2oh .
● forward() : In this function, we pass a vector of all the actual labels in the
training set and call fit() to populate the uniq2oh dictionary with unique
labels and their corresponding one-hot encoding and then use it to return an
array of one-hot encoded labels for each label in the training set. This function
has already been filled out for you!
● inverse() : In the function, we reverse the one-hot encoding back to the actual
label. This has already been done for you.
For example, if we have labels X and Y with one-hot encodings of [1,0] and [0,1], we’d want to create a dictionary as follows: {X: [1,0], Y: [0,1]}. As shown in the image above,for MNIST, you will have 10 labels, so your dictionary should have ten entries!
You may notice that some classes inherit from Callable or Diffable. More on this
in the next section!
3. Core Abstractions
Consider the following abstract classes of modules. Be sure to play around with the
Python notebook associated with this homework to get a good grip of the core
abstraction modules defined for you in Beras/core.py ! The notebook is exploratory in nature (it is NOT required and all of the code is given) and will provide you with lots of insights into understanding and using these class abstractions! Note that these modules are very similar to the Tensorflow/Keras API.
Callable: A function with a well-defined forward function. These are the ones you’ll need to implement:
● CategoricalAccuracy (./metrics.py): Computes the accuracy of predicted
probabilities against a list of ground-truth labels. As accuracy is not optimized for,
there is no need to compute its gradient. Furthermore, categorical accuracy is
piecewise discontinuous, so the gradient would technically be 0 or undefined.
● OneHotEncoder (./onehot.py) : You can one-hot encode a class instance into a probability distribution to optimize for classifying into discrete options
Diffable: A callable which can also be differentiated. We can use these in our pipeline and optimize through them! Thus, most of these classes are made for use in your neural network layers. These are the ones you’ll need to implement:
Example: Consider a Dense layer instance. Let s represents the input size (source), d
represents the output size (destination), and b represents the batch size. Then:
GradientTape: This class will function exactly like tf.GradientTape() (See lab 3).
You can think of a GradientTape as a logger. Every time an operation is performed within
the scope of a GradientTape, it records which operation occurred. Then, during
backprop, we can compute the gradient for all of the operations. This allows us to
differentiate our final output with respect to any intermediate step. When operations are
computed outside the scope of GradientTape, they aren’t recorded, so your code will
have no record of them and cannot compute the gradients.
You can check out how this is implemented in core! Of course, Tensorflow’s gradient
tape implementation is a lot more complicated and involves constructing a graph.
● [TODO] Implement the gradient method, which returns a list of gradients corresponding to the list of trainable weights in the network. Details in the code.
4. Layers
For the purposes of this assignment, you will implement the Dense layer to use in your
sequential model in Beras/layers.py . You have to fill in the following methods.
● forward() : [TODO] Implement the forward pass and return the outputs.
● weight_gradients() : [TODO] Calculate the gradients with respect to the
weights and the biases. This will be used to optimize the layer .
● input_gradients() : [TODO] Calculate the gradients with respect to the
layer inputs. This will be used to propagate the gradient to previous layers.
● _initialize_weight() : [TODO] Initialize the dense layer’s weight values.
By default, initialize all the weights to zero (usually a bad idea, by the way). You
are also required to allow for more sophisticated options (when the initializer is
set to normal, xavier, and kaiming). Follow Keras math assumptions!
○ Normal: Pretty self-explanatory, a unit normal distribution.
○ Xavier Normal: Based on keras.GlorotNormal .
○ Kaiming He Normal: Based on Keras.HeNormal .
You may find np.random.normal helpful while implementing these. The TODOs
provide some justification for why these different initialization methods are necessary but
for more detail, check out this website ! Feel free to add more initializer options!
5. Activation Functions
In this assignment, you will be implementing two major activation functions, namely,
LeakyReLU and Softmax in Beras/activations.py . Since ReLU is a special case
of LeakyReLU , we have already provided you with the code for it.
● LeakyReLU()
○ forward() : [TODO] Given input x , compute & return LeakyReLU(x) .
○ input_gradients() : [TODO] Compute & return the partial with
respect to inputs by differentiating LeakyReLU .
● Softmax() : (2470 ONLY)
forward(): [TODO] Given input x , compute & return Softmax(x) .
Make sure you use stable softmax where you subtract max of all entries
to prevent overflow/undvim erflow issues.
○ input_gradients() : [TODO] Partial w.r.t. inputs of Softmax() .
6. Filling in the model
With these abstractions in mind, let’s create a pipeline for our sequential deep learning
model. You can find the SequentialModel class in assignment.py where you will
initialize your neural network’s layers, parameters (weights and biases), and
hyperparameters (optimizer, loss function, learning rate, accuracy function, etc.). The
SequentialModel class inherits from Beras/model.py , where you’ll find many
useful methods. This will also contain functions that fit the model to your data and
evaluate the performance of your model:
● compile() : Initialize the model optimizer, loss function, & accuracy function,
which are fed in as arguments, for your SequentialModel instance to use.
● fit() : Trains your model to associate input to outputs. Training is repeated for
each epoch, and the data is batched based on argument. It also computes
batch_metrics , epoch_metrics , and the aggregated agg_metrics that
can be used to track the training progress of your model.
● evaluate() : [TODO] Evaluate the performance of the final model using the
metrics mentioned above during the testing phase. It’s almost identical to the
fit() function; think about what would change between training and testing).
● call() : [TODO] Hint: what does it mean to call a sequential model? Remember
that a sequential model is a stack of layers where each layer has exactly one
input vector and one output vector. You can find this function within the
SequentialModel class in assignment.py .
● batch_step() : [TODO] You will observe that fit() calls this function for each
batch. You will first compute your model predictions for the input batch. In the
training phase, you will need to compute gradients and update your weights
according to the optimizer you are using. For backpropagation during training,
you will use GradientTape from the core abstractions ( core.py ) to record
operations and intermediate values. You will then use the model's optimizer to
apply the gradients to your model's trainable variables. Finally, compute and
return the loss and accuracy for the batch. You can find this function within the
SequentialModel class in assignment.py .
We encourage you to check out keras.SequentialModel in the intro notebook
(under Exploring a possible modular implementation: TensorFlow/Keras ) and refer
to Lab 3 to get a feel for how we can work with gradient tapes in deep learning.
7. Loss Function
This is one of the most crucial aspects of model training. In this assignment, we will
implement the MSE or mean-squared error loss function. You can find your loss function
in Beras/losses.py .
● forward() : [TODO] Write a function that computes and returns the mean
squared error given the predicted and actual labels.
Hint: What is MSE?
Given the predicted and actual labels, MSE is the average of the squares of the
differences between predicted and actual values.
● input_gradients() : [TODO] Compute and return the gradients. Use
differentiation to derive the formula for these gradients.
8. Optimizers
In the Beras/optimizers.py file make sure to implement the optimization for each of
the different types of optimizers. Lab 2 should help with this, so good luck!
● BasicOptimizer : [TODO] A simple optimizer strategy as seen in Lab 1.
● RMSProp : [TODO] Root mean squared error propagation.
● Adam : [TODO] A common adaptive motion estimation-based optimizer.
9. Accuracy metrics
Finally, to evaluate the performance of your model, you need to use appropriate
accuracy metrics. In this assignment, you will implement categorical accuracy in
Beras/metrics.py :
● forward() : [TODO] Return the categorical accuracy of your model given the
predicted probabilities and true labels. You should be returning the proportion of
predicted labels equal to the true labels, where the predicted label for an image is
the label corresponding to the highest probability. Refer to the internet or lecture
slides for categorical accuracy math!
10. Train and Test
Finally, using all the above primitives, you are required to build two models in
assignment.py:
● A simple model in get_simple_model() with at most one Diffable layer (e.g.,
Dense - ./layers.py ) and one activation function (look for them in
./activation.py ). This one is provided for you by default, though you can
change it if you’d like. The autograder will evaluate the original one though!
● A slightly more complex model in get_advanced_model() with two or more
Diffable layers and two or more activation functions. We recommend using Adam
optimizer for this model with a decently low learning rate.
For any hyperparameters you use (layer sizes, learning rate, epoch size, batch size,
etc.), please hardcode these values in the get_simple_model() and
get_advanced_model() functions. Do NOT store them under the main handler.
Once everything is implemented, you can use python3 assignment.py to run your
model and see the loss/accuracy!
11. Visualizing Results
We provided the visualize_metrics method for you to visualize how your loss and
accuracy changes after each batch using matplotlib. DO NOT EDIT THIS FUNCTION.
You should call this function in your main method after you store the loss and accuracy
per batch in an array, which would be passed into this function. This should plot line
graphs where the horizontal axis is the i'th batch and the vertical axis is the
loss/accuracy value of the batch. Calling this is OPTIONAL!
We've also provided the visualize_images method for you to visualize your
predictions against the true labels with matplotlib. This method is currently written with
the labels having a shape of [number of images, 1]. DO NOT EDIT THIS FUNCTION .
You should call this function with all your inputs and labels after training your model. The
function will randomly pick 500 samples from your input and will plot 10 correct and 10
incorrect classifications to help you visually interpret your model’s predictions! You
should do this last, after you have met the benchmark for test accuracy.
CS1470 Students
- Complete and Submit HW2 Conceptual
- Implement Beras per specifications and make a SequentialModel in assignment.py
- Test the model inside of main
- Get test accuracy >=85% on MNIST with default get_simple_model_components .
- Complete the Exploration notebook and export it to a PDF.
- The “HW2 Intro to Beras” notebook is just for your reference.
CS2470 Students
- Same as 1470 except:
- Implement Softmax activation function (forward pass and input_gradients )
- Get testing accuracy >95% on MNIST model from get_advanced_model_components .
- You will need to specify a multi-layered model, will have to explore
hyperparameter options, and may want to add additional features.
- Additional features may include regularization, other weight initialization
schemes, aggregation layers, dropout, rate scheduling, or skip connections. If
you have other ideas, feel free to ask publicly on Ed and we’ll let you know if they
are also ok.
- When implementing these features, try to mimic the Keras API as much as
possible. This will help significantly with your Exploration notebook.
- Finish 2470 components for Exploration notebook and conceptual questions.
Grading and Autograder Compatibility
Conceptual : You will be primarily graded on correctness, thoughtfulness, and clarity.
Code: You will be primarily graded on functionality. Your model should have an accuracy that is
at least greater than the threshold on the testing data. For 1470, this can be achieved with the
simple model parameterization provided. For 2470, you may need to experiment with
hyperparameters or develop some custom components.
Although you will not be graded on code style, you should not have an excessive number of
print statements in your final submission.
IMPORTANT! Please use vectorized operations when possible and limit the number of for loops
you use. While there is no strict time limit for running this assignment, it should typically be less
than 3 minutes. The autograder will automatically time out after 10 minutes. You will not receive
any credit for methods that use Tensorflow or Keras functions within them.
Notebook: The exploration notebook will be graded manually and should be submitted as a pdf
file. Feel free to use the “Notebooks to Latex PDFs.ipynb” notebook! Handing In
You should submit the assignment via Gradescope under the corresponding project assignment
by zipping up your hw2 folder (the path on Gradescope MUST be hw2/code/filename.py) or
through GitHub ( recommended ). To submit through GitHub, commit and push all changes to
your repository to GitHub. You can do this by running the following three commands ( this is a
good resource for learning more about them):
1. git add file1 file2 file3 (or -A)
2. git commit -m “commit message”
3. git push
After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.
IMPORTANT!
1. Please make sure all your files are in hw2/code . Otherwise, the autograder will fail!
2. Delete the data folder before zipping up your code.
3. 2470 STUDENTS : Add a blank file named 2470student in the hw2/code directory!
The file should have no extension, and is used as a flag to grade 2470-specific requirements. If
you don’t do this, YOU WILL LOSE POINTS!
Thanks!