1 矩阵\(Y=f(x)\)对标量x求导
矩阵Y是一个\(m\times n\)的矩阵,对标量x求导,相当于矩阵中每个元素对x求导
\[\frac{dY}{dx}=\begin{bmatrix}\dfrac{df_{11}(x)}{dx} & \ldots & \dfrac{df_{1n}(x)}{dx} \\ \vdots & \ddots &\vdots \\ \dfrac{df_{m1}(x)}{dx} & \ldots & \dfrac{df_{mn}(x)}{dx} \end{bmatrix}\]
2 标量y=f(x)对矩阵X求导
注意与上面不同,这次括号内是求偏导,\(X\)是是一个\(m\times n\)的矩阵,函数\(y=f(x)\)对矩阵\(X\)中的每个元素求偏导,对\(m\times n\)矩阵求导后还是\(m\times n\)矩阵
\[\frac{dy}{dX} = \begin{bmatrix}\dfrac{\partial f}{\partial x_{11}} & \ldots & \dfrac{\partial f}{\partial x_{1n}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial f}{\partial x_{m1}} & \ldots & \dfrac{\partial f}{\partial x_{mn}}\end{bmatrix}\]
3 函数矩阵Y对矩阵X求导
矩阵\(Y=F(x)\)对每一个\(X\)的元素求导,构成一个超级矩阵
\[F(x)=\begin{bmatrix}f_{11}(x) & \ldots & f_{1n}(x)\\ \vdots & \ddots &\vdots \\ f_{m1}(x) & \ldots & f_{mn}(x) \end{bmatrix}\]
\[X=\begin{bmatrix}x_{11} & \ldots & x_{1s}\\ \vdots & \ddots &\vdots \\ x_{r1} & \ldots & x_{rs}\end{bmatrix}\]
\[\frac{dF}{dX} = \begin{bmatrix}\dfrac{\partial F}{\partial x_{11}} & \ldots & \dfrac{\partial F}{\partial x_{1s}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial F}{\partial x_{r1}} & \ldots & \dfrac{\partial F}{\partial x_{rs}}\end{bmatrix}\]
其中
\[\frac{\partial F}{\partial x_{ij}} = \begin{bmatrix}\dfrac{\partial f_{11}}{\partial x_{ij}} & \ldots & \dfrac{\partial f_{1n}}{\partial x_{ij}}\\ \vdots & \ddots & \vdots \\\dfrac{\partial f_{m1}}{\partial x_{ij}} & \ldots & \dfrac{\partial f_{mn}}{\partial x_{ij}}\end{bmatrix}\]
4 向量导数
若\(m\times 1\)向量函数\(y=[y_1,y_2,…,y_m]^T\),其中,\(y_1,y_2,…,y_m\)是向量的标量函数。\(x\)是\(n\times 1\)向量。则有
\[\frac{\partial Y}{\partial X^T} = \begin{bmatrix}\dfrac{\partial y_1}{\partial x_1} & \ldots & \dfrac{\partial y_1}{\partial x_n}\\ \vdots & \ddots & \vdots \\\dfrac{\partial y_m}{\partial x_1} & \ldots & \dfrac{\partial y_m}{\partial x_n}\end{bmatrix}\]
这是一个\(m\times n\)矩阵,称作向量函数\(y\)的Jacobi矩阵。
若\(y=[x_1,x_2,…,x_n]\),则有
\[\frac{\partial{x^T}}{\partial{x}}=I \tag{$1$}\]
其中,\(I\)是单位矩阵。
若\(A\)和\(y\)均与向量\(x\)无关,则
\[\frac{\partial{x^TAy}}{\partial{x}}=\frac{\partial{x^T}}{\partial{x}}Ay=Ay \tag{$1$}\]
注意到:\(y^TAx=<A^Ty,x>=<x,A^Ty>=x^TA^Ty\), 向量内积的公式,故
\[\frac{\partial{y^TAx}}{\partial{x}}=\frac{\partial{x^TA^Ty}}{\partial{x}}=A^Ty \tag{$2$}\]
由于\(x^TAx=\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}A_{ij}x_ix_j\)
可求出梯度\(\frac{\partial{x^TAx}}{\partial{x}}\)的第k个分量为
\[\bigg[\frac{\partial{x^TAx}}{\partial{x}}\bigg]_k=\frac{\partial}{\partial{x_k}}\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}A_{ij}x_ix_j=\sum\limits_{i=1}^nA_{ik}x_i+\sum\limits_{j=1}^{n}A_{kj}x_j\]
即有公式
\[\frac{\partial{x^TAx}}{\partial{x}}=Ax+A^Tx \tag{$3$}\]
特别地,若\(A\)为对称矩阵,则有\(\frac{\partial{x^TAx}}{\partial{x}}=2Ax\)
用上面三个公式,我们能够得到更多的实值函数\(f(x)\)相对于列向量\(x\)的几个常用梯度公式:
若\(f(x)=c\)为常数,则有梯度 \(\frac{\partial{c}}{\partial{x}}=0\)
线性法则:若\(f(x)\)和\(g(x)\)分别是向量\(x\)的实值函数,\(c_1\)和\(c_2\)为实常数,则有
\[\frac{\partial{[c_1f(x)+c_2g(x)]}}{\partial{x}}=c_1\frac{\partial{f(x)}}{\partial{x}}+c_2\frac{\partial{g(x)}}{\partial{x}}\]
乘积法则:若\(f(x)\)和\(g(x)\)都是向量\(x\)的实值函数,则
\[\frac{\partial{f(x)g(x)}}{\partial{x}}=g(x)\frac{\partial{f(x)}}{\partial{x}}+f(x)\frac{\partial{g(x)}}{\partial{x}}\]
若\(f(x)\),\(g(x)\)和\(h(x)\)都是向量\(x\)的实值函数,则
\[\frac{\partial{f(x)g(x)h(x)}}{\partial{x}}=g(x)h(x)\frac{\partial{f(x)}}{\partial{x}}+f(x)h(x)\frac{\partial{g(x)}}{\partial{x}}+f(x)g(x)\frac{\partial{h(x)}}{\partial{x}}\]
商法则:若\(g(x)\neq0\),则
\[\frac{\partial{f(x)/g(x)}}{\partial{x}}=\frac{1}{g^2(x)}\big[g(x)\frac{\partial{f(x)}}{\partial{x}}-f(x)\frac{\partial{g(x)}}{\partial{x}}\big]\]
链式法则:若\(y(x)\)是\(x\)的向量值函数,则
\[\frac{\partial{f(y(x))}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\frac{\partial{f(y)}}{\partial{y}}\]
其中,\(\frac{\partial{y^T(x)}}{\partial{x}}\)为\(n\times n\)矩阵。
若\(n\times 1\)向量\(\alpha\) 与\(x\)是无关的常数向量,则
\[\frac{\partial{\alpha^Ty(x)}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\alpha\]
\[\frac{\partial{y^T(x)\alpha}}{\partial{x}}=\frac{\partial{y^T(x)}}{\partial{x}}\alpha\]
令\(x\)为\(n\times 1\)向量,\(\alpha\)为\(m\times 1\)常数向量,\(A\)和\(B\)分别为\(m\times n\)和\(m\times m\)常数矩阵,且\(B])为对称矩阵,则
\[\frac{\partial{(\alpha-Ax)^TB(\alpha-Ax)}}{\partial{x}}=-2A^TB(\alpha-Ax)\]
5 迹函数的梯度矩阵
二次项目标函数可以利用矩阵的迹重写,因为一标量可以视为\(1\times 1\)矩阵。所以二次项目标函数的迹直接等于函数本身,即
\[f(x)=x^TAx=tr(x^TAx)=tr(Axx^T)\]
\[\frac{\partial{tr(A)}}{\partial{A}}=I \tag{$1$}\]
\[\frac{\partial{tr(AB)}}{\partial{A}}=B^T \tag{$2$}\]
由于\(tr(xy^T)=tr(yx^T)=X^Ty\),所以
\[\frac{\partial{tr(xy^T)}}{\partial{x}}=\frac{\partial{tr(yx^T)}}{\partial{x}}=y \tag{$3$} \]
\(m\times m\)矩阵\(W\)可逆时,有
\[\frac{\partial{tr(W^{-1})}}{\partial{W}}=-(W^{-1})^T \tag{$4$}\]
另外几个公式:
\[\frac{\partial{f(A)}}{\partial{A^T}}=(\frac{\partial{f(A)}}{\partial{A}})^T\]
\[\frac{\partial{tr(ABA^TC)}}{\partial{A}}= CAB + C^TAB^T \]
\[\frac{\partial{|A|}}{\partial{A}}=|A|(A^{-1})^T\]