# 神经网络架构搜索——可微分搜索（SGAS）

### 动机

NAS技术都有一个通病：在搜索过程中验证精度较高，但是在实际测试精度却没有那么高。传统的基于梯度搜索的DARTS技术，是根据block构建更大的超网，由于搜索的过程中验证不充分，最终eval和test精度会出现鸿沟。从下图的Kendall系数来看，DARTS搜出的网络精度排名和实际训练完成的精度排名偏差还是比较大。

### 方法

#### 三个指标

##### 边的重要性

$$S_{E I}^{(i, j)}=\sum_{o \in \mathcal{O}, o \neq z e r o} \frac{\exp \left(\alpha_{o}^{(i, j)}\right)}{\sum_{o^{\prime} \in \mathcal{O}} \exp \left(\alpha_{o^{\prime}}^{(i, j)}\right)}$$

alphas = []
for i in range(4):
for n in range(2 + i):
alphas.append(Variable(1e-3 * torch.randn(8)))
# alphas经过训练后
mat = F.softmax(torch.stack(alphas, dim=0), dim=-1).detach() # mat为14*8维度的二维列表，softmax归一化。
EI = torch.sum(mat[:, 1:], dim=-1) # EI为14个数的一维列表，去掉none后的7个ops对应alpha值相加
##### 选择的准确性

$$\begin{array}{c} p_{o}^{(i, j)}=\frac{\exp \left(\alpha_{o}^{(i, j)}\right)}{S_{E I}^{(i, j)} \sum_{o^{\prime} \in \mathcal{O}} \exp \left(\alpha_{o^{\prime}}^{(i, j)}\right)}, o \in \mathcal{O}, o \neq z e r o \\ S_{S C}^{(i, j)}=1-\frac{-\sum_{o \in \mathcal{O}, o \neq z e r o} p_{o}^{(i, j)} \log \left(p_{o}^{(i, j)}\right)}{\log (|\mathcal{O}|-1)} \end{array}$$

import torch.distributions.categorical as cate
probs = mat[:, 1:] / EI[:, None]
entropy = cate.Categorical(probs=probs).entropy() / math.log(probs.size()[1])
SC = 1-entropy
##### 选择的稳定性

$$S_{S S}^{(i, j)}=\frac{1}{K} \sum_{t=T-K}^{T-1} \sum_{o_{t} \in \mathcal{O}, o_{t} \neq z e r o} \min \left(p_{o_{t}}^{(i, j)}, p_{o_{T}}^{(i, j)}\right)$$

def histogram_intersection(a, b):
c = np.minimum(a.cpu().numpy(),b.cpu().numpy())
c = torch.from_numpy(c).cuda()
sums = c.sum(dim=1)
return sums

def histogram_average(history, probs):
histogram_inter = torch.zeros(probs.shape[0], dtype=torch.float).cuda()
if not history:
return histogram_inter
for hist in history:
histogram_inter += utils.histogram_intersection(hist, probs)
histogram_inter /= len(history)
return histogram_inter

probs_history = []

probs_history.append(probs)
if (len(probs_history) > args.history_size):
probs_history.pop(0)

histogram_inter = histogram_average(probs_history, probs)

SS = histogram_inter

#### 两种评估准则

##### 评估准则1：

$$S_{1}^{(i, j)}=\text { normalize }\left(S_{E I}^{(i, j)}\right) * \text { normalize }\left(S_{S C}^{(i, j)}\right)$$

def normalize(v):
min_v = torch.min(v)
range_v = torch.max(v) - min_v
if range_v > 0:
normalized_v = (v - min_v) / range_v
else:
normalized_v = torch.zeros(v.size()).cuda()

return normalized_v

score = utils.normalize(EI) * utils.normalize(SC)
##### 评估准则2：

$$S_{2}^{(i, j)}=S_{1}^{(i, j)} * \text { normalize }\left(S_{S S}^{(i, j)}\right)$$

score = utils.normalize(EI) * utils.normalize(SC) * utils.normalize(SS)

### 参考

[1] Li, Guohao et al. ,SGAS: Sequential Greedy Architecture Search

![更多内容关注微信公众号【AI异构】]

+ 订阅