HW2: Numpy for Multi-Layer Neural Network

简介: HW2: Numpy for Multi-Layer Neural Network

Before


       This homework is intended to give you an introduction to building, training, and testing neural network models. You will not only be exposed to using Python packages to build a neural network from scratch, but also the mathematical aspects of backpropagation and gradient descent. While in practical scenarios, you won’t necessarily have to implement neural networks from scratch (as you will see in future labs and assignments), this assignment aims at giving you a rudimentary idea of what goes on under the hood in packages such as TensorFlow and Keras. In this assignment, you will use the MNIST handwritten digits dataset to train a simple classification neural network using batch learning and evaluate your model.


中文(Chinese): http://t.csdn.cn/2AyUX


Link to this article .pdf file


---------------------------------------------------------------------------------------------------------------------------------

password:v6zc

https://pan.baidu.com/s/1_qFbN0Nhc8MNYCHEyTbkLg%C2%A0


Provide finished code files:


Get Files:


link:https://pan.baidu.com/s/1Fw_7thL5PxR79zI6XbpnYQ

password:txqe


Files Structrue:


| - hw2 
        | - code
                | - Beras
                        | -  eight files extension is .py
                | - assignment.py
                | - preprocess.py
                | - visualize.py
        | - data
                | - mnist
                        | - four files of minst dataset


Conceptual Questions


Please submit the Conceptuals Questions on Gradescope under hw2-mlp conceptual .


You must type your submissions and upload a PDF. We recommend using LaTeX.


Getting the Stencil


Github Classroom Link with Stencil Code!


Guide on GitHub and GitHub Classroom


Getting the Data


You can use download.sh for downloading the data. You can run a bash script with the command ./script_name.sh (ex: bash ./download.sh ). This is similar to HW1. Setup Work off of the stencil code provided, but do not change the stencil except where specified.


Doing so may cause incompatibility with the autograder. Don’t change any method signatures!


This assignment requires NumPy and Matplotlib. You should already have this from HW1. Also check the virtual environment guide to set up TensorFlow 2.5 on your local machine. Feel free to reference this guide if you have to work through colab.


Assignment Overview


In this assignment, you will be constructing a Keras mimic, Beras (haha funny name), and will make a sequential model specification that mimics the Tensorflow/Keras API. The Python notebook associated with this homework is meant for you to explore an example implementation so that you can build on it yourself! There are no TODOs for you to work on in the notebook ;


rather, the testing is done by running the main method of assignment.py .


Our stencil provides a model class with several methods and hyperparameters you need to use for your network. You will also answer conceptual questions related to the assignment and class material (don’t forget to answer the 2470-only questions if you’re a 2470 student!). You should include a brief README with your model's accuracy and any known bugs.


Before Starting


This homework assignment is due two weeks from release. Labs 1-3 provide great practice for this assignment, so you can wait a little for them if you get stuck. Specifically:


-

Implementing Callable/Diffable Components : The skills you need in order to do this

can be found by working through Lab 1 . This includes comfort with mathematical

notation, matrix operations, and the logic behind call and gradient methods.


-

Implementing Optimizers: You can implement the BasicOptimizer class just by

following the logic from the gradient_descent method in lab 1. The other optimizers

(i.e. Adam, RMSProp) will be covered in Lab 2: Optimizers .


-

Using batch_step and GradientTape: You can figure out how to use these to train your

model based on the assignment instructions and your implementations of these. With

that said, they do mimic the Keras API. You’ll learn about all this in Lab 3: Intro to

Tensorflow . If your lab is after the due date, it should be fine; just skim over the

complementary notebook associated with the lab.


Feel free to start off by doing what you can and then add onto it as you learn more about deep learning and realize that the same concepts you learn in the class can actually be used here!


Do not get discouraged, and try to have fun! Roadmap

For this assignment, we'll walk you through the pipeline of training a neural net, including the structure of the model class and the methods you will have to fill in.


1. Preprocessing the Data


Note: Code for preprocessing should be pulled in from HW1.


Before training a network, you will need to clean your data. This includes retrieving,

altering, and formatting the data into the inputs for your network. For this assignment,you will be working on the MNIST dataset. It can be downloaded through the download.sh script, but it’s also linked here (ignore that it says hw1; we’re using this dataset for hw2 this time!). The original data source is here .

You should train your network using only the training data and then test your network's accuracy on the testing data. Your program should print its accuracy over the test dataset upon completion.


2. One Hot Encoding


Before training or testing your model, you will need to “one-hot” encode your class labels so that the model can optimize towards predicting any desired class. Note that the class labels by themselves are simply categories and do not mean anything numerically. In the absence of one-hot encoding, your model might learn some natural ordering between the different class labels based on the labels (which are arbitrary).


For example, let’s say there’s a data point A which corresponds to label ‘2’ and a data point B which corresponds to label ‘7’. We don’t want the model to somehow learn that B has a higher weightage than A simply because, numerically speaking, 7 > 2.


To one-hot encode your class labels, you will

have to convert your 1-dimensional label vector

into a vector of size num_classes (where

num_classes is the total number of classes in

your dataset). For the MNIST dataset, it looks

something like the matrix on the right:


1dc618a0ed9580ce8bfa6facb208c08f.png

You have to fill out the following method in Beras/onehot.py


● fit() : [TODO] In this function you need to fetch all the unique labels in the

data (store this in self.uniq ) and create a dictionary with labels as the keys

and their corresponding one hot encodings as values. Hint: You might want to

look at np.eye() to get the one-hot encodings. Ultimately, you will store this

dictionary in self.uniq2oh .


● forward() : In this function, we pass a vector of all the actual labels in the

training set and call fit() to populate the uniq2oh dictionary with unique

labels and their corresponding one-hot encoding and then use it to return an

array of one-hot encoded labels for each label in the training set. This function

has already been filled out for you!


● inverse() : In the function, we reverse the one-hot encoding back to the actual

label. This has already been done for you.


For example, if we have labels X and Y with one-hot encodings of [1,0] and [0,1], we’d want to create a dictionary as follows: {X: [1,0], Y: [0,1]}. As shown in the image above,for MNIST, you will have 10 labels, so your dictionary should have ten entries!


You may notice that some classes inherit from Callable or Diffable. More on this

in the next section!


3. Core Abstractions


Consider the following abstract classes of modules. Be sure to play around with the

Python notebook associated with this homework to get a good grip of the core

abstraction modules defined for you in Beras/core.py ! The notebook is exploratory in nature (it is NOT required and all of the code is given) and will provide you with lots of insights into understanding and using these class abstractions! Note that these modules are very similar to the Tensorflow/Keras API.


Callable: A function with a well-defined forward function. These are the ones you’ll need to implement:


       ● CategoricalAccuracy (./metrics.py): Computes the accuracy of predicted

probabilities against a list of ground-truth labels. As accuracy is not optimized for,

there is no need to compute its gradient. Furthermore, categorical accuracy is

piecewise discontinuous, so the gradient would technically be 0 or undefined.


       ● OneHotEncoder (./onehot.py) : You can one-hot encode a class instance into a probability distribution to optimize for classifying into discrete options

1dc618a0ed9580ce8bfa6facb208c08f.png

Diffable: A callable which can also be differentiated. We can use these in our pipeline and optimize through them! Thus, most of these classes are made for use in your neural network layers. These are the ones you’ll need to implement:

5d4c6812c8535adbb050f4ddf2e1bce8.png

Example: Consider a Dense layer instance. Let s represents the input size (source), d

represents the output size (destination), and b represents the batch size. Then:

46a9d80a6e05e4e3b19d57a0ee70bcdf.png

GradientTape: This class will function exactly like tf.GradientTape() (See lab 3).

You can think of a GradientTape as a logger. Every time an operation is performed within

the scope of a GradientTape, it records which operation occurred. Then, during

backprop, we can compute the gradient for all of the operations. This allows us to

differentiate our final output with respect to any intermediate step. When operations are

computed outside the scope of GradientTape, they aren’t recorded, so your code will

have no record of them and cannot compute the gradients.

You can check out how this is implemented in core! Of course, Tensorflow’s gradient

tape implementation is a lot more complicated and involves constructing a graph.

● [TODO] Implement the gradient method, which returns a list of gradients corresponding to the list of trainable weights in the network. Details in the code.


4. Layers


For the purposes of this assignment, you will implement the Dense layer to use in your

sequential model in Beras/layers.py . You have to fill in the following methods.

● forward() : [TODO] Implement the forward pass and return the outputs.

● weight_gradients() : [TODO] Calculate the gradients with respect to the

weights and the biases. This will be used to optimize the layer .

● input_gradients() : [TODO] Calculate the gradients with respect to the

layer inputs. This will be used to propagate the gradient to previous layers.

● _initialize_weight() : [TODO] Initialize the dense layer’s weight values.

By default, initialize all the weights to zero (usually a bad idea, by the way). You

are also required to allow for more sophisticated options (when the initializer is

set to normal, xavier, and kaiming). Follow Keras math assumptions!

       ○ Normal: Pretty self-explanatory, a unit normal distribution.

       ○ Xavier Normal: Based on keras.GlorotNormal .

       ○ Kaiming He Normal: Based on Keras.HeNormal .

You may find np.random.normal helpful while implementing these. The TODOs

provide some justification for why these different initialization methods are necessary but

for more detail, check out this website ! Feel free to add more initializer options!


5. Activation Functions


In this assignment, you will be implementing two major activation functions, namely,

LeakyReLU and Softmax in Beras/activations.py . Since ReLU is a special case

of LeakyReLU , we have already provided you with the code for it.

● LeakyReLU()

       ○ forward() : [TODO] Given input x , compute & return LeakyReLU(x) .

       ○ input_gradients() : [TODO] Compute & return the partial with

       respect to inputs by differentiating LeakyReLU .

● Softmax() : (2470 ONLY)

       forward(): [TODO] Given input x , compute & return Softmax(x) .

       Make sure you use stable softmax where you subtract max of all entries

       to prevent overflow/undvim erflow issues.

       ○ input_gradients() : [TODO] Partial w.r.t. inputs of Softmax() .


6. Filling in the model


With these abstractions in mind, let’s create a pipeline for our sequential deep learning

model. You can find the SequentialModel class in assignment.py where you will

initialize your neural network’s layers, parameters (weights and biases), and

hyperparameters (optimizer, loss function, learning rate, accuracy function, etc.). The

SequentialModel class inherits from Beras/model.py , where you’ll find many

useful methods. This will also contain functions that fit the model to your data and

evaluate the performance of your model:

● compile() : Initialize the model optimizer, loss function, & accuracy function,

which are fed in as arguments, for your SequentialModel instance to use.

● fit() : Trains your model to associate input to outputs. Training is repeated for

each epoch, and the data is batched based on argument. It also computes

batch_metrics , epoch_metrics , and the aggregated agg_metrics that

can be used to track the training progress of your model.

● evaluate() : [TODO] Evaluate the performance of the final model using the

metrics mentioned above during the testing phase. It’s almost identical to the

fit() function; think about what would change between training and testing).

● call() : [TODO] Hint: what does it mean to call a sequential model? Remember

that a sequential model is a stack of layers where each layer has exactly one

input vector and one output vector. You can find this function within the

SequentialModel class in assignment.py .

● batch_step() : [TODO] You will observe that fit() calls this function for each

batch. You will first compute your model predictions for the input batch. In the

training phase, you will need to compute gradients and update your weights

according to the optimizer you are using. For backpropagation during training,

you will use GradientTape from the core abstractions ( core.py ) to record

operations and intermediate values. You will then use the model's optimizer to

apply the gradients to your model's trainable variables. Finally, compute and

return the loss and accuracy for the batch. You can find this function within the

SequentialModel class in assignment.py .

We encourage you to check out keras.SequentialModel in the intro notebook

(under Exploring a possible modular implementation: TensorFlow/Keras ) and refer

to Lab 3 to get a feel for how we can work with gradient tapes in deep learning.


7. Loss Function


This is one of the most crucial aspects of model training. In this assignment, we will

implement the MSE or mean-squared error loss function. You can find your loss function

in Beras/losses.py .

● forward() : [TODO] Write a function that computes and returns the mean

squared error given the predicted and actual labels.

Hint: What is MSE?

Given the predicted and actual labels, MSE is the average of the squares of the

differences between predicted and actual values.

● input_gradients() : [TODO] Compute and return the gradients. Use

differentiation to derive the formula for these gradients.


8. Optimizers


In the Beras/optimizers.py file make sure to implement the optimization for each of

the different types of optimizers. Lab 2 should help with this, so good luck!

● BasicOptimizer : [TODO] A simple optimizer strategy as seen in Lab 1.

● RMSProp : [TODO] Root mean squared error propagation.

● Adam : [TODO] A common adaptive motion estimation-based optimizer.


9. Accuracy metrics


Finally, to evaluate the performance of your model, you need to use appropriate

accuracy metrics. In this assignment, you will implement categorical accuracy in

Beras/metrics.py :

● forward() : [TODO] Return the categorical accuracy of your model given the

predicted probabilities and true labels. You should be returning the proportion of

predicted labels equal to the true labels, where the predicted label for an image is

the label corresponding to the highest probability. Refer to the internet or lecture

slides for categorical accuracy math!


10. Train and Test


Finally, using all the above primitives, you are required to build two models in

assignment.py:

● A simple model in get_simple_model() with at most one Diffable layer (e.g.,

Dense - ./layers.py ) and one activation function (look for them in

./activation.py ). This one is provided for you by default, though you can

change it if you’d like. The autograder will evaluate the original one though!

● A slightly more complex model in get_advanced_model() with two or more

Diffable layers and two or more activation functions. We recommend using Adam

optimizer for this model with a decently low learning rate.

1dc618a0ed9580ce8bfa6facb208c08f.png

For any hyperparameters you use (layer sizes, learning rate, epoch size, batch size,

etc.), please hardcode these values in the get_simple_model() and

get_advanced_model() functions. Do NOT store them under the main handler.

Once everything is implemented, you can use python3 assignment.py to run your

model and see the loss/accuracy!


11. Visualizing Results


We provided the visualize_metrics method for you to visualize how your loss and

accuracy changes after each batch using matplotlib. DO NOT EDIT THIS FUNCTION.

You should call this function in your main method after you store the loss and accuracy

per batch in an array, which would be passed into this function. This should plot line

graphs where the horizontal axis is the i'th batch and the vertical axis is the

loss/accuracy value of the batch. Calling this is OPTIONAL!

We've also provided the visualize_images method for you to visualize your

predictions against the true labels with matplotlib. This method is currently written with

the labels having a shape of [number of images, 1]. DO NOT EDIT THIS FUNCTION .

You should call this function with all your inputs and labels after training your model. The

function will randomly pick 500 samples from your input and will plot 10 correct and 10

incorrect classifications to help you visually interpret your model’s predictions! You

should do this last, after you have met the benchmark for test accuracy.


CS1470 Students


- Complete and Submit HW2 Conceptual

- Implement Beras per specifications and make a SequentialModel in assignment.py

- Test the model inside of main

- Get test accuracy >=85% on MNIST with default get_simple_model_components .

- Complete the Exploration notebook and export it to a PDF.

- The “HW2 Intro to Beras” notebook is just for your reference.


CS2470 Students


- Same as 1470 except:

- Implement Softmax activation function (forward pass and input_gradients )

- Get testing accuracy >95% on MNIST model from get_advanced_model_components .

- You will need to specify a multi-layered model, will have to explore

hyperparameter options, and may want to add additional features.

- Additional features may include regularization, other weight initialization

schemes, aggregation layers, dropout, rate scheduling, or skip connections. If

you have other ideas, feel free to ask publicly on Ed and we’ll let you know if they

are also ok.

- When implementing these features, try to mimic the Keras API as much as

possible. This will help significantly with your Exploration notebook.

- Finish 2470 components for Exploration notebook and conceptual questions.

Grading and Autograder Compatibility

Conceptual : You will be primarily graded on correctness, thoughtfulness, and clarity.

Code: You will be primarily graded on functionality. Your model should have an accuracy that is

at least greater than the threshold on the testing data. For 1470, this can be achieved with the

simple model parameterization provided. For 2470, you may need to experiment with

hyperparameters or develop some custom components.

Although you will not be graded on code style, you should not have an excessive number of

print statements in your final submission.

IMPORTANT! Please use vectorized operations when possible and limit the number of for loops

you use. While there is no strict time limit for running this assignment, it should typically be less

than 3 minutes. The autograder will automatically time out after 10 minutes. You will not receive

any credit for methods that use Tensorflow or Keras functions within them.

Notebook: The exploration notebook will be graded manually and should be submitted as a pdf

file. Feel free to use the “Notebooks to Latex PDFs.ipynb” notebook! Handing In

You should submit the assignment via Gradescope under the corresponding project assignment

by zipping up your hw2 folder (the path on Gradescope MUST be hw2/code/filename.py) or

through GitHub ( recommended ). To submit through GitHub, commit and push all changes to

your repository to GitHub. You can do this by running the following three commands ( this is a

good resource for learning more about them):

1. git add file1 file2 file3 (or -A)

2. git commit -m “commit message”

3. git push

After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.


1dc618a0ed9580ce8bfa6facb208c08f.png


IMPORTANT!


1. Please make sure all your files are in hw2/code . Otherwise, the autograder will fail!

2. Delete the data folder before zipping up your code.

3. 2470 STUDENTS : Add a blank file named 2470student in the hw2/code directory!

The file should have no extension, and is used as a flag to grade 2470-specific requirements. If

you don’t do this, YOU WILL LOSE POINTS!

Thanks!


Refer to the answer:http://t.csdn.cn/9sdBN



相关文章
|
机器学习/深度学习 算法 Python
基于numpy的前馈神经网络(feedforward neural network)
简单介绍了前馈神经网络的运算流程,并用python实现了一个L层的含有L2正则化的神经网络。
5160 0
|
10天前
|
机器学习/深度学习 数据可视化 搜索推荐
Python在社交媒体分析中扮演关键角色,借助Pandas、NumPy、Matplotlib等工具处理、可视化数据及进行机器学习。
【7月更文挑战第5天】Python在社交媒体分析中扮演关键角色,借助Pandas、NumPy、Matplotlib等工具处理、可视化数据及进行机器学习。流程包括数据获取、预处理、探索、模型选择、评估与优化,以及结果可视化。示例展示了用户行为、话题趋势和用户画像分析。Python的丰富生态使得社交媒体洞察变得高效。通过学习和实践,可以提升社交媒体分析能力。
25 1
|
2月前
|
程序员 开发工具 索引
图解Python numpy基本操作
图解Python numpy基本操作
|
1月前
|
BI 测试技术 索引
Python学习笔记之NumPy模块——超详细(安装、数组创建、正态分布、索引和切片、数组的复制、维度修改、拼接、分割...)-1
Python学习笔记之NumPy模块——超详细(安装、数组创建、正态分布、索引和切片、数组的复制、维度修改、拼接、分割...)
|
2天前
|
SQL 并行计算 API
Dask是一个用于并行计算的Python库,它提供了类似于Pandas和NumPy的API,但能够在大型数据集上进行并行计算。
Dask是一个用于并行计算的Python库,它提供了类似于Pandas和NumPy的API,但能够在大型数据集上进行并行计算。
19 9
|
3天前
|
机器学习/深度学习 数据采集 数据挖掘
解锁 Python 数据分析新境界:Pandas 与 NumPy 高级技巧深度剖析
【7月更文挑战第12天】Python的Pandas和NumPy库助力高效数据处理。Pandas用于数据清洗,如填充缺失值和转换类型;NumPy则擅长数组运算,如元素级加法和矩阵乘法。结合两者,可做复杂数据分析和特征工程,如产品平均销售额计算及销售额标准化。Pandas的时间序列功能,如移动平均计算,进一步增强分析能力。掌握这两者高级技巧,能提升数据分析质量和效率。
17 4
|
10天前
|
数据采集 机器学习/深度学习 数据可视化
了解数据科学面试中的Python数据分析重点,包括Pandas(DataFrame)、NumPy(ndarray)和Matplotlib(图表绘制)。
【7月更文挑战第5天】了解数据科学面试中的Python数据分析重点,包括Pandas(DataFrame)、NumPy(ndarray)和Matplotlib(图表绘制)。数据预处理涉及缺失值(dropna(), fillna())和异常值处理。使用describe()进行统计分析,通过Matplotlib和Seaborn绘图。回归和分类分析用到Scikit-learn,如LinearRegression和RandomForestClassifier。
26 3
|
1天前
|
数据采集 数据挖掘 数据处理
Python数据分析加速器:深度挖掘Pandas与NumPy的高级功能
【7月更文挑战第14天】Python的Pandas和NumPy库是数据分析的核心工具。Pandas以其高效的数据处理能力,如分组操作和自定义函数应用,简化了数据清洗和转换。NumPy则以其多维数组和广播机制实现快速数值计算。两者协同工作,如在DataFrame与NumPy数组间转换进行预处理,提升了数据分析的效率和精度。掌握这两者的高级功能是提升数据科学技能的关键。**
7 0
|
27天前
|
Python
NumPy 是 Python 中的一个重要的科学计算包,其核心是一个强大的 N 维数组对象 Ndarray
【6月更文挑战第18天】NumPy的Ndarray是科学计算的核心,具有ndim(维度数)、shape(各维度大小)、size(元素总数)和dtype(数据类型)属性。方法包括T(转置)、ravel()(扁平化)、reshape()(改变形状)、astype()(转换数据类型)、sum()(求和)及mean()(计算平均值)。更多属性和方法如min/max等可在官方文档中探索。
37 5
|
3天前
|
数据挖掘 数据处理 决策智能
Python 数据分析工具箱:深挖 Pandas 与 NumPy 高级功能,驱动智能决策
【7月更文挑战第12天】Python的Pandas和NumPy是数据分析的基石。Pandas提供灵活的数据结构如DataFrame,用于高效处理关系型数据,而NumPy则以多维数组和科学计算功能著称。两者结合,支持数据合并(如`pd.merge`)、时间序列分析(`pd.to_datetime`)和高级数组运算。通过掌握它们的高级特性,能提升数据分析效率,应用于各领域,如金融风险评估、市场分析和医疗预测,助力数据驱动的决策。学习和熟练运用Pandas与NumPy是成为出色数据分析师的关键。