【李宏毅机器学习CP4】(task2)回归+Python Basics with Numpy

简介: 第一部分:回归栗子ps:CP3的部分在上一篇笔记中【李宏毅机器学习】CP1-3笔记了。1.问题描述现在假设有10个x_data和y

第一部分:回归栗子

ps:CP3的部分在上一篇笔记中【李宏毅机器学习】CP1-3笔记了。

1.问题描述

现在假设有10个x_data和y_data,x和y之间的关系是


image.png

 b,w都是参数,是需要学习出来的。现在我们来练习用梯度下降找到b和w。

import numpy as np
import matplotlib.pyplot as plt
from pylab import mpl
# matplotlib没有中文字体,动态解决
plt.rcParams['font.sans-serif'] = ['Simhei']  # 显示中文
mpl.rcParams['axes.unicode_minus'] = False  # 解决保存图像是负号'-'显示为方块的问题
x_data = [338., 333., 328., 207., 226., 25., 179., 60., 208., 606.]
y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]
x_d = np.asarray(x_data)
y_d = np.asarray(y_data)

arange,numpy.arange(start, stop, step, dtype = None)生成数组

值在半开区间 [开始,停止]内生成(换句话说,包括开始但不包括停止的区间),返回的是 ndarray 。

x = np.arange(-200, -100, 1)
y = np.arange(-5, 5, 0.1)
Z = np.zeros((len(x), len(y)))
X, Y = np.meshgrid(x, y)

2.利用梯度下降寻找最优参数

# loss
for i in range(len(x)):
    for j in range(len(y)):
        b = x[i]
        w = y[j]
        Z[j][i] = 0  # meshgrid吐出结果:y为行,x为列
        for n in range(len(x_data)):
            Z[j][i] += (y_data[n] - b - w * x_data[n]) ** 2
        Z[j][i] /= len(x_data)

先给b和w一个初始值(先设定b=-120,w=-4),计算出b和w的偏微分,迭代更新b和w值,实现根据梯度下降寻找loss值最低点。

# linear regression
b = -120
w = -4
#b=-2
#w=0.01
lr = 0.0000001
iteration = 100000
b_history = [b]
w_history = [w]
loss_history = []
import time
start = time.time()
for i in range(iteration):
    m = float(len(x_d))
    y_hat = w * x_d  +b
    loss = np.dot(y_d - y_hat, y_d - y_hat) / m
    grad_b = -2.0 * np.sum(y_d - y_hat) / m
    grad_w = -2.0 * np.dot(y_d - y_hat, x_d) / m
    # update param
    b -= lr * grad_b
    w -= lr * grad_w
    b_history.append(b)
    w_history.append(w)
    loss_history.append(loss)
    if i % 10000 == 0:
        print("Step %i, w: %0.4f, b: %.4f, Loss: %.4f" % (i, w, b, loss))
end = time.time()
print("大约需要时间:",end-start)

输出的结果为:

Step 0, w: 1.6534, b: -119.9839, Loss: 3670819.0000
Step 10000, w: 2.4781, b: -121.8628, Loss: 11428.6652
Step 20000, w: 2.4834, b: -123.6924, Loss: 11361.7161
Step 30000, w: 2.4885, b: -125.4716, Loss: 11298.3964
Step 40000, w: 2.4935, b: -127.2020, Loss: 11238.5092
Step 50000, w: 2.4983, b: -128.8848, Loss: 11181.8685
Step 60000, w: 2.5030, b: -130.5213, Loss: 11128.2983
Step 70000, w: 2.5076, b: -132.1129, Loss: 11077.6321
Step 80000, w: 2.5120, b: -133.6607, Loss: 11029.7126
Step 90000, w: 2.5164, b: -135.1660, Loss: 10984.3908
大约需要时间: 1.8699753284454346
# plot the figure
plt.contourf(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))  # 填充等高线
# 标记最优解的位置为橙色的X符号
plt.plot([-188.4], [2.67], 'x', ms=12, mew=3, color="orange")
# 标记迭代过程中的线条为黑色线
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)# 定义直方图的横纵坐标的范围
plt.ylim(-5, 5)
plt.xlabel(r'$b$')
plt.ylabel(r'$w$')
plt.title("线性回归")
plt.show()

输出结果如图

image.png

横坐标是b,纵坐标是w,标记×位(上图的橙色叉叉)最优解,显然,在图中我们并没有运行得到最优解,最优解十分的遥远。那么我们就调大learning rate,lr = 0.000001(调大10倍),得到结果如下图,发现还不如一开始的lr值效果好。

image.png

我们再调大learning rate,lr = 0.00001(调大10倍),得到结果如下图,发现更加接近最优解了,但是在b=-120的这条竖线往上随着迭代过程中出现剧烈震荡的现象:

image.png

结果发现learning rate太大了,结果很不好。


3.给b和w特制化两种learning rate

所以我们给b和w特制化两种learning rate

# linear regression
b = -120
w = -4
lr = 1
iteration = 100000
b_history = [b]
w_history = [w]
lr_b=0
lr_w=0
import time
start = time.time()
for i in range(iteration):
    b_grad=0.0
    w_grad=0.0
    for n in range(len(x_data)):
        b_grad=b_grad-2.0*(y_data[n]-n-w*x_data[n])*1.0
        w_grad= w_grad-2.0*(y_data[n]-n-w*x_data[n])*x_data[n]
    lr_b=lr_b+b_grad**2
    lr_w=lr_w+w_grad**2
    # update param
    b -= lr/np.sqrt(lr_b) * b_grad
    w -= lr /np.sqrt(lr_w) * w_grad
    b_history.append(b)
    w_history.append(w)
# plot the figure
plt.contourf(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))  # 填充等高线
plt.plot([-188.4], [2.67], 'x', ms=12, mew=3, color="orange")
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$b$')
plt.ylabel(r'$w$')
plt.title("线性回归")
plt.show()

image.png

有了新的特制化两种learning rate就可以在10w次迭代之内到达最优点了。


第二部分:Python Basics with Numpy

该内容来自吴恩达的task2的numpy练习


Avoid using for-loops and while-loops, unless you are explicitly told to do so.就是少用或者不用循环,防止时间复杂度过高。

math库的方法输入一般是实数,而numpy输入一般是矩阵或者向量,所以numpy更适合DL

在anaconda jupyter上运行代码只需要shift+enter;如果如要查询文档,如np.exp的详解,就直接开一个新的cell写np.exp?即可。或者看官方文档:https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html

0.目标:

Be able to use numpy functions and numpy matrix/vector operations

理解广播 “broadcasting”

Be able to vectorize code

1.用numpy建立基本函数

1)sigmoid函数

先用math.exp()实现sigmoid函数——一种非线性的逻辑函数(计算公式如下),不仅在机器学习中的逻辑回归,在深度学习中也会用作激活函数。


sigmoid函数用于隐层神经元输出,取值范围为(0,1),可以将一个实数映射到(0,1)的区间,可以用来做二分类。


优点:平滑,易于求导

缺点:激活函数计算量大,反向传播求误差梯度时,求导涉及除法;反向传播时,容易出现梯度消失的情况(从而无法完成深层网络的训练)。

sigmoid ⁡ ( x ) = 1 1 + e − x \operatorname{sigmoid}(x)=\frac{1}{1+e^{-x}}

sigmoid(x)=

1+e

−x

1

image.png

  • ps:使用包的用法package_name.function(),如math.exp()。
import math
# from public_tests import *
# GRADED FUNCTION: basic_sigmoid
def basic_sigmoid(x):
    """
    Compute sigmoid of x.
    Arguments:
    x -- A scalar
    Return:
    s -- sigmoid(x)
    """
    # (≈ 1 line of code)
    # s = 
    # YOUR CODE STARTS HERE
    s = 1/(1 + math.exp(-x))
    # YOUR CODE ENDS HERE
    return s
print("basic_sigmoid(1) = " + str(basic_sigmoid(1)))
# 打印输出basic_sigmoid(1) = 0.7310585786300049

numpy输入一般是矩阵或向量,实数当然也可以:

### One reason why we use "numpy" instead of "math" in Deep Learning ###
x = [1, 2, 3]
basic_sigmoid(x) # you will see this give an error when you run it, because x is a
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-8ccefa5bf989> in <module>()
      1 ### One reason why we use "numpy" instead of "math" in Deep Learning ###
      2 x = [1, 2, 3]
----> 3 basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector.
<ipython-input-3-28be2b65a92e> in basic_sigmoid(x)
     15 
     16     ### START CODE HERE ### (≈ 1 line of code)
---> 17     s = 1/(1 + math.exp(-x))
     18     ### END CODE HERE ###
     19 
TypeError: bad operand type for unary -: 'list'

这次用np.exp()代替math.exp(),然后对向量


image.png

image.png

中的每个元素都执行对应的指数运算。

import numpy as np
# example of np.exp
t_x = np.array([1, 2, 3])
print(np.exp(t_x)) # result is (exp(1), exp(2), exp(3))
# 输出[ 2.71828183  7.3890561  20.08553692]

np搞的sigmoid函数

表达式:

image.png

# GRADED FUNCTION: sigmoid
def sigmoid(x):
    """
    Compute the sigmoid of x
    Arguments:
    x -- A scalar or numpy array of any size
    Return:
    s -- sigmoid(x)
    """
    # (≈ 1 line of code)
    # s = 
    # YOUR CODE STARTS HERE
    s = 1/(1 + np.exp(-x))
    # YOUR CODE ENDS HERE
    return s
t_x = np.array([1, 2, 3])
print("sigmoid(t_x) = " + str(sigmoid(t_x)))
sigmoid_test(sigmoid)
# 输出为sigmoid(t_x) = [0.73105858 0.88079708 0.95257413]

2)sigmoid gradient

通过反向传播(backpropagation)计算梯度来优化损失函数(loss functions)

举个栗子:

image.png

# GRADED FUNCTION: sigmoid_derivative
def sigmoid_derivative(x):
    """
    Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
    You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
    Arguments:
    x -- A scalar or numpy array
    Return:
    ds -- Your computed gradient.
    """
    #(≈ 2 lines of code)
    # s = 
    # ds = 
    # YOUR CODE STARTS HERE
    s = sigmoid(x)
    ds = s * (1 - s)
    # YOUR CODE ENDS HERE
    return ds
t_x = np.array([1, 2, 3])
print ("sigmoid_derivative(t_x) = " + str(sigmoid_derivative(t_x)))
# 打印sigmoid_derivative(t_x) = [0.19661193 0.10499359 0.04517666]
#sigmoid_derivative_test(sigmoid_derivative)

3)改变多维数组size——image2vector

类似numpy的shape和reshape函数,如果我们要将一张图片,3维矩阵(length,height,depth=3)作为算法的输入时,需要将其形状转为(length * height * 3,1),即将3维张量展开成一维向量。

image.png

(1)reshape an array v of shape (a, b, c) into a vector of shape (a*b,c)。甚至也可以使用v = v.reshape(-1, 1)。

v = v.reshape((v.shape[0] * v.shape[1], v.shape[2])) 
# v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c

回到一开始的将图片转为向量:

# GRADED FUNCTION:image2vector
def image2vector(image):
    """
    Argument:
    image -- a numpy array of shape (length, height, depth)
    Returns:
    v -- a vector of shape (length*height*depth, 1)
    """
    # (≈ 1 line of code)
    # v =
    # YOUR CODE STARTS HERE
    v = image.reshape(image.shape[0] * image.shape[1] * image.shape[2], 1)    
    # YOUR CODE ENDS HERE
    return v
# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
t_image = np.array([[[ 0.67826139,  0.29380381],
                     [ 0.90714982,  0.52835647],
                     [ 0.4215251 ,  0.45017551]],
                   [[ 0.92814219,  0.96677647],
                    [ 0.85304703,  0.52351845],
                    [ 0.19981397,  0.27417313]],
                   [[ 0.60659855,  0.00533165],
                    [ 0.10820313,  0.49978937],
                    [ 0.34144279,  0.94630077]]])
print ("image2vector(image) = " + str(image2vector(t_image)))

输出结果为:

image2vector(image) = [[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]

4)归一化

1.范数

范数包括向量范数和矩阵范数,向量范数表示向量的模(大小),矩阵范数表示矩阵引起变化的大小,前者好理解,后者举例:AX=B将向量X变为B的这个变化大小。

image.png

举栗子:向量x的欧式范数为:

image.png

在下面的n p . l i n a l g . n o r m np.linalg.normnp.linalg.norm的参数ord就表示矩阵的范数:

ord=1:列和的最大值

ord=2:


image.png

image.png

求特征值,然后求最大特征值得算术平方根

ord=正无穷:行和的最大值


2.np.linalg.norm

归一化数据可以使梯度下降得更快。

向量归一化方法:每个行向量除以该行向量的模,举例:

x = [ 0 3 4 2 6 4 ] x=\left[

023644

034264

\right]

计算行向量的模长,得到(2,1)的矩阵x=np .linalg.norm (x, axis =1, keepdims = True )=[556]

如果设置keepdims=True将会对初始的x进行正确的广播。

axis=1表示按行进行归一化(使得每行行向量为单位向量),axis=0是按列进行归一化

numpy.linalg.norm还有一个参数ord,用来表示矩阵的范数,其他也可参考:https://numpy.org/doc/stable/reference/routines.array-creation.html

归一化操作:


image.png

image.png

# GRADED FUNCTION: normalize_rows
def normalize_rows(x):
    """
    Implement a function that normalizes each row of the matrix x (to have unit length).
    Argument:
    x -- A numpy matrix of shape (n, m)
    Returns:
    x -- The normalized (by row) numpy matrix. You are allowed to modify x.
    """
    #(≈ 2 lines of code)
    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
    # x_norm =
    # Divide x by its norm.
    # x =
    # YOUR CODE STARTS HERE
    x_norm = np.linalg.norm(x, axis = 1, keepdims = True)
    x = x / x_norm
    print(x.shape)
    print(x_norm.shape)
    # YOUR CODE ENDS HERE
    return x
x = np.array([[0, 3, 4],
              [1, 6, 4]])
print("normalizeRows(x) = " + str(normalize_rows(x)))
# normalizeRows_test(normalize_rows)

注意上面不要使用x /= x_norm。对于矩阵除法,numpy必须广播x_norm,这是操作符/=不支持的。如下所示的x_norm的shape是(2,1)这个运算过程就用了广播(下面会说)。

结果输出:

(2, 3)
(2, 1)
normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]

5)广播和softmax function

广播法则(B r o a d c a s t BroadcastBroadcast)是科学运算中经常用到的一个技巧,它在快速执行向量化的同时不会占用额外的内存/显存。Numpy的广播法则定义如下:

(1)让所有输入数组都向其中shape最长的数组看齐,shape中不足的部分通过在前面加1补齐

(2)2个数组要么在某一个维度的长度一致,要么其中一个为1,否则不能计算

(3)当输入数组的某个维度的长度为1时,计算时沿着此维度复制扩充成一样的形状。


对于一个(1,n)的向量来说:

for  x ∈ R 1 × n   \text { for } x \in \mathbb{R}^{1 \times n} \text { } for x∈R

1×n


image.png

image.png

softmax可看做是分类任务时用到的一个normalizing函数(吴恩达的2nd课有详说)

在斯坦福大学CS224n课程中关于softmax的解释:

image.png

对于一个(m,n)的矩阵来说

image.png

Notes: 后面部分会用m表示number of training examples,and each training example is in its own column of the matrix. Also, each feature will be in its own row (each row has data for the same feature).


Softmax should be performed for all features of each training example, so softmax would be performed on the columns (once we switch to that representation later in this course).

# GRADED FUNCTION: softmax
def softmax(x):
    """Calculates the softmax for each row of the input x.
    Your code should work for a row vector and also for matrices of shape (m,n).
    Argument:
    x -- A numpy matrix of shape (m,n)
    Returns:
    s -- A numpy matrix equal to the softmax of x, of shape (m,n)
    """
    # YOUR CODE STARTS HERE
     #Apply exp() element-wise to x. Use np.exp(...).
    x_exp = np.exp(x)
    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
    x_sum = np.sum(x_exp, axis = 1,keepdims = True)
    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
    s = x_exp /  x_sum
    # YOUR CODE ENDS HERE
    return s
t_x = np.array([[9, 2, 5, 0, 0],
                [7, 5, 0, 0 ,0]])
print("softmax(x) = " + str(softmax(t_x)))
# softmax_test(softmax)

上面的x_sum为(2,1)矩阵,x_exp和s均为(2,5)矩阵,通过【广播机制】运算,输出结果为:

softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
  1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
  8.01252314e-04]]

小结:用numpy建立基本函数

image2vector常用在DL中

用np.reshape使矩阵或者向量维度straight将能减少很多bug

广播机制

2.向量化

通过数据向量化能大大降低计算的时间复杂度,搞清楚dot/outer/elementwise product的区别。

用传统的for循环:

import time
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot += x1[i] * x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1), len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i] * x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i] * x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j] * x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")

三种运算结果,正常的计算时间应该不是0啊。。这个地方有点毛病,后面康康:

dot = 278
 ----- Computation time = 0.0ms
outer = [[81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [63. 14. 14. 63.  0. 63. 14. 35.  0.  0. 63. 14. 35.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [81. 18. 18. 81.  0. 81. 18. 45.  0.  0. 81. 18. 45.  0.  0.]
 [18.  4.  4. 18.  0. 18.  4. 10.  0.  0. 18.  4. 10.  0.  0.]
 [45. 10. 10. 45.  0. 45. 10. 25.  0.  0. 45. 10. 25.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
 ----- Computation time = 0.0ms
elementwise multiplication = [81.  4. 10.  0.  0. 63. 10.  0.  0.  0. 81.  4. 25.  0.  0.]
 ----- Computation time = 0.0ms
gdot = [29.69127752 19.45650093 30.14580312]
 ----- Computation time = 0.0ms

采用向量化计算:

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")

输出结果为:

dot = 278
 ----- Computation time = 0.0ms
outer = [[81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [63 14 14 63  0 63 14 35  0  0 63 14 35  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [81 18 18 81  0 81 18 45  0  0 81 18 45  0  0]
 [18  4  4 18  0 18  4 10  0  0 18  4 10  0  0]
 [45 10 10 45  0 45 10 25  0  0 45 10 25  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]]
 ----- Computation time = 0.0ms
elementwise multiplication = [81  4 10  0  0 63 10  0  0  0 81  4 25  0  0]
 ----- Computation time = 0.0ms
gdot = [29.69127752 19.45650093 30.14580312]
 ----- Computation time = 0.0ms

np.dot()执行矩阵-矩阵 或者 矩阵-向量的乘法运算,他和np.multiply()*均不相同。

1)L1和L2 loss function

两个loss的定义:

image.png

# GRADED FUNCTION: L1
def L1(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    Returns:
    loss -- the value of the L1 loss function defined above
    """
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.sum(np.abs(yhat - y))
    ### END CODE HERE ###
    return loss
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L1 = " + str(L1(yhat,y)))
>>>
L1 = 1.1

image.png

计算平方可以使用np.dot()函数,如果

image.png

# GRADED FUNCTION: L2
def L2(yhat, y):
    """
    Arguments:
    yhat -- vector of size m (predicted labels)
    y -- vector of size m (true labels)
    Returns:
    loss -- the value of the L2 loss function defined above
    """
    ### START CODE HERE ### (≈ 1 line of code)
    loss = np.dot((yhat - y),(yhat - y).T)
    ### END CODE HERE ###
    return loss
yhat = np.array([.9, 0.2, 0.1, .4, .9])
y = np.array([1, 0, 0, 1, 1])
print("L2 = " + str(L2(yhat,y)))
>>>
L2 = 0.43

Reference

datawhale course

相关文章
|
6月前
|
机器学习/深度学习 算法 Python
机器学习特征筛选:向后淘汰法原理与Python实现
向后淘汰法(Backward Elimination)是机器学习中一种重要的特征选择技术,通过系统性地移除对模型贡献较小的特征,以提高模型性能和可解释性。该方法从完整特征集出发,逐步剔除不重要的特征,最终保留最具影响力的变量子集。其优势包括提升模型简洁性和性能,减少过拟合,降低计算复杂度。然而,该方法在高维特征空间中计算成本较高,且可能陷入局部最优解。适用于线性回归、逻辑回归等统计学习模型。
249 7
|
4月前
|
机器学习/深度学习 人工智能 算法
Scikit-learn:Python机器学习的瑞士军刀
想要快速入门机器学习但被复杂算法吓退?本文详解Scikit-learn如何让您无需深厚数学背景也能构建强大AI模型。从数据预处理到模型评估,从垃圾邮件过滤到信用风险评估,通过实用案例和直观图表,带您掌握这把Python机器学习的'瑞士军刀'。无论您是AI新手还是经验丰富的数据科学家,都能从中获取将理论转化为实际应用的关键技巧。了解Scikit-learn与大语言模型的最新集成方式,抢先掌握机器学习的未来发展方向!
779 12
Scikit-learn:Python机器学习的瑞士军刀
|
7月前
|
机器学习/深度学习 数据可视化 算法
Python与机器学习:使用Scikit-learn进行数据建模
本文介绍如何使用Python和Scikit-learn进行机器学习数据建模。首先,通过鸢尾花数据集演示数据准备、可视化和预处理步骤。接着,构建并评估K近邻(KNN)模型,展示超参数调优方法。最后,比较KNN、随机森林和支持向量机(SVM)等模型的性能,帮助读者掌握基础的机器学习建模技巧,并展望未来结合深度学习框架的发展方向。
Python与机器学习:使用Scikit-learn进行数据建模
|
6月前
|
机器学习/深度学习 数据可视化 TensorFlow
Python 高级编程与实战:深入理解数据科学与机器学习
本文深入探讨了Python在数据科学与机器学习中的应用,介绍了pandas、numpy、matplotlib等数据科学工具,以及scikit-learn、tensorflow、keras等机器学习库。通过实战项目,如数据可视化和鸢尾花数据集分类,帮助读者掌握这些技术。最后提供了进一步学习资源,助力提升Python编程技能。
|
6月前
|
机器学习/深度学习 数据可视化 算法
Python 高级编程与实战:深入理解数据科学与机器学习
在前几篇文章中,我们探讨了 Python 的基础语法、面向对象编程、函数式编程、元编程、性能优化和调试技巧。本文将深入探讨 Python 在数据科学和机器学习中的应用,并通过实战项目帮助你掌握这些技术。
|
10月前
|
机器学习/深度学习 数据可视化 数据处理
掌握Python数据科学基础——从数据处理到机器学习
掌握Python数据科学基础——从数据处理到机器学习
166 0
|
10月前
|
机器学习/深度学习 数据采集 人工智能
机器学习入门:Python与scikit-learn实战
机器学习入门:Python与scikit-learn实战
354 0
|
机器学习/深度学习 存储 搜索推荐
利用机器学习算法改善电商推荐系统的效率
电商行业日益竞争激烈,提升用户体验成为关键。本文将探讨如何利用机器学习算法优化电商推荐系统,通过分析用户行为数据和商品信息,实现个性化推荐,从而提高推荐效率和准确性。
452 14
|
机器学习/深度学习 算法 数据可视化
实现机器学习算法时,特征选择是非常重要的一步,你有哪些推荐的方法?
实现机器学习算法时,特征选择是非常重要的一步,你有哪些推荐的方法?
418 1
|
机器学习/深度学习 算法 搜索推荐
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)
Machine Learning机器学习之决策树算法 Decision Tree(附Python代码)

热门文章

最新文章

推荐镜像

更多