Tutorial: Triplet Loss Layer Design for CNN

简介: Tutorial:  Triplet Loss Layer Design for CNN Xiao Wang  2016.05.02     Triplet Loss Layer could be a trick for further improving the accuracy of CNN.

 

Tutorial:  Triplet Loss Layer Design for CNN

Xiao Wang  2016.05.02

 

  Triplet Loss Layer could be a trick for further improving the accuracy of CNN. Today, I will introduce the whole process, and display the code for you. This tutorial mainly from the blog: 

  http://blog.csdn.net/tangwei2014/article/details/46812153 

  http://blog.csdn.net/tangwei2014/article/details/46788025

  and the paper: <FaceNet: A Unified Embedding for Face Recognition and Clustering>.

 

  First, Let's talk about how to add the layer into caffe and make test this layer to check whether it works or not. And then, we will discuss the paper and introduce the process of how the triplet loss come from. In the new version of caffe framework, it mainly consists of these steps for add a new layer i.e. 

  step 1. add the paprameter message in the corresponding layer, which located in ./src/caffe/proto/caffe.proto ;

  step 2. add the declaration information of the layer in ./include/caffe/***layers.hpp ;

  step 3. add the corresponding .cpp and .cu files in ./src/caffe/layers/, realize the function of the new added layer;

  step 4. add test code of new added layers in ./src/caffe/gtest/, test its foreward and back propagation and its computation speed. 

 

  Let's do it step by step. 

  First, we add triplet loss layer in caffe.proto file:

  we could found that in line 101, it said: SolverParameter next available ID: 40 (last added: momentum2), thus we add the ID: 40 as the new added information :  

        message RankParameter {
        optional uint32 neg_num = 1 [default = 1];
        optional uint32 pair_size = 2 [default = 1];
        optional float hard_ratio = 3;
        optional float rand_ratio = 4;
        optional float margin = 5 [default = 0.5];
        }

 

    

    

  Second, we add the declearation information about triplet loss layer in ./include/caffe/TripletLoss_layers.hpp 

  

 

  Third, We compile the triplet loss layer of .cpp and .cu file 

  First of all is the .cpp file 

  1 #include <vector>
  2 
  3 #include <algorithm>
  4 #include <cmath>
  5 #include <cfloat>
  6 
  7 #include "caffe/layer.hpp"
  8 #include "caffe/util/io.hpp"
  9 #include "caffe/util/math_functions.hpp"
 10 #include "caffe/vision_layers.hpp"
 11 
 12 using std::max;
 13 using namespace std;
 14 using namespace cv;
 15 
 16 namespace caffe {
 17 
 18 int myrandom (int i) { return caffe_rng_rand()%i;}
 19 
 20 
 21 template <typename Dtype>
 22 void RankHardLossLayer<Dtype>::Reshape(
 23   const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 24   LossLayer<Dtype>::Reshape(bottom, top);
 25 
 26   diff_.ReshapeLike(*bottom[0]);
 27   dis_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 28   mask_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 29 }
 30 
 31 
 32 template <typename Dtype>
 33 void RankHardLossLayer<Dtype>::set_mask(const vector<Blob<Dtype>*>& bottom)
 34 {
 35 
 36     RankParameter rank_param = this->layer_param_.rank_param();
 37     int neg_num = rank_param.neg_num();
 38     int pair_size = rank_param.pair_size();
 39     float hard_ratio = rank_param.hard_ratio();
 40     float rand_ratio = rank_param.rand_ratio();
 41     float margin = rank_param.margin();
 42 
 43     int hard_num = neg_num * hard_ratio;
 44     int rand_num = neg_num * rand_ratio;
 45 
 46     const Dtype* bottom_data = bottom[0]->cpu_data();
 47     const Dtype* label = bottom[1]->cpu_data();
 48     int count = bottom[0]->count();
 49     int num = bottom[0]->num();
 50     int dim = bottom[0]->count() / bottom[0]->num();
 51     Dtype* dis_data = dis_.mutable_cpu_data();
 52     Dtype* mask_data = mask_.mutable_cpu_data();
 53 
 54     for(int i = 0; i < num * num; i ++)
 55     {
 56         dis_data[i] = 0;
 57         mask_data[i] = 0;
 58     }
 59 
 60     // calculate distance
 61     for(int i = 0; i < num; i ++)
 62     {
 63         for(int j = i + 1; j < num; j ++)
 64         {
 65             const Dtype* fea1 = bottom_data + i * dim;
 66             const Dtype* fea2 = bottom_data + j * dim;
 67             Dtype ts = 0;
 68             for(int k = 0; k < dim; k ++)
 69             {
 70               ts += (fea1[k] * fea2[k]) ;   
 71             }                                
 72             dis_data[i * num + j] = -ts;    
 73             dis_data[j * num + i] = -ts;    
 74         }
 75     }
 76 
 77     //select samples
 78 
 79     vector<pair<float, int> >negpairs;
 80     vector<int> sid1;
 81     vector<int> sid2;
 82 
 83 
 84     for(int i = 0; i < num; i += pair_size)
 85     {
 86         negpairs.clear();
 87         sid1.clear();
 88         sid2.clear();
 89         for(int j = 0; j < num; j ++)
 90         {
 91             if(label[j] == label[i])
 92                 continue;
 93             Dtype tloss = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
 94             if(tloss == 0) continue;
 95 
 96             negpairs.push_back(make_pair(dis_data[i * num + j], j));
 97         }
 98         if(negpairs.size() <= neg_num)
 99         {
100             for(int j = 0; j < negpairs.size(); j ++)
101             {
102                 int id = negpairs[j].second;
103                 mask_data[i * num + id] = 1;
104             }
105             continue;
106         }
107         sort(negpairs.begin(), negpairs.end());
108 
109         for(int j = 0; j < neg_num; j ++)
110         {
111             sid1.push_back(negpairs[j].second);
112         }
113         for(int j = neg_num; j < negpairs.size(); j ++)
114         {
115             sid2.push_back(negpairs[j].second);
116         }
117         std::random_shuffle(sid1.begin(), sid1.end(), myrandom);
118         for(int j = 0; j < min(hard_num, (int)(sid1.size()) ); j ++)
119         {
120             mask_data[i * num + sid1[j]] = 1;
121         }
122         for(int j = hard_num; j < sid1.size(); j++)
123         {
124             sid2.push_back(sid1[j]);
125         }
126         std::random_shuffle(sid2.begin(), sid2.end(), myrandom);
127         for(int j = 0; j < min( rand_num, (int)(sid2.size()) ); j ++)
128         {
129             mask_data[i * num + sid2[j]] = 1;
130         }
131 
132     }
133 
134 
135 }
136 
137 
138 
139 
140 template <typename Dtype>
141 void RankHardLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
142     const vector<Blob<Dtype>*>& top) {
143 
144     const Dtype* bottom_data = bottom[0]->cpu_data();
145     const Dtype* label = bottom[1]->cpu_data();
146     int count = bottom[0]->count();
147     int num = bottom[0]->num();
148     int dim = bottom[0]->count() / bottom[0]->num();
149 
150 
151     RankParameter rank_param = this->layer_param_.rank_param();
152     int neg_num = rank_param.neg_num();      // 4
153     int pair_size = rank_param.pair_size();  // 5
154     float hard_ratio = rank_param.hard_ratio();
155     float rand_ratio = rank_param.rand_ratio();
156     float margin = rank_param.margin();
157     Dtype* dis_data = dis_.mutable_cpu_data();
158     Dtype* mask_data = mask_.mutable_cpu_data();
159 
160     set_mask(bottom);
161     Dtype loss = 0;
162     int cnt = neg_num * num / pair_size * 2;
163 
164     for(int i = 0; i < num; i += pair_size)
165     {
166         for(int j = 0; j < num; j++)
167         {
168             if(mask_data[i * num + j] == 0) 
169                 continue;
170             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
171             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
172             loss += tloss1 + tloss2;
173         }
174     }
175 
176     loss = loss / cnt;
177     top[0]->mutable_cpu_data()[0] = loss;
178 }
179 
180 
181 
182 
183 template <typename Dtype>
184 void RankHardLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
185     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
186 
187 
188     const Dtype* bottom_data = bottom[0]->cpu_data();
189     const Dtype* label = bottom[1]->cpu_data();
190     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
191     int count = bottom[0]->count();
192     int num = bottom[0]->num();
193     int dim = bottom[0]->count() / bottom[0]->num();
194 
195     RankParameter rank_param = this->layer_param_.rank_param();
196     int neg_num = rank_param.neg_num();
197     int pair_size = rank_param.pair_size();
198     float hard_ratio = rank_param.hard_ratio();
199     float rand_ratio = rank_param.rand_ratio();
200     float margin = rank_param.margin();
201 
202     Dtype* dis_data = dis_.mutable_cpu_data();
203     Dtype* mask_data = mask_.mutable_cpu_data();
204 
205     for(int i = 0; i < count; i ++ )
206         bottom_diff[i] = 0;
207 
208     int cnt = neg_num * num / pair_size * 2;
209 
210     for(int i = 0; i < num; i += pair_size)
211     {
212         const Dtype* fori = bottom_data + i * dim;
213         const Dtype* fpos = bottom_data + (i + 1) * dim;
214 
215         Dtype* fori_diff = bottom_diff + i * dim;
216         Dtype* fpos_diff = bottom_diff + (i + 1) * dim;
217         for(int j = 0; j < num; j ++)
218         {
219             if(mask_data[i * num + j] == 0) continue;
220             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
221             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
222 
223             const Dtype* fneg = bottom_data + j * dim;
224             Dtype* fneg_diff  = bottom_diff + j * dim;
225             if(tloss1 > 0)
226             {
227                 for(int k = 0; k < dim; k ++)
228                 {
229                     fori_diff[k] += (fneg[k] - fpos[k]); // / (pairNum * 1.0 - 2.0);
230                     fpos_diff[k] += -fori[k]; // / (pairNum * 1.0 - 2.0);
231                     fneg_diff[k] +=  fori[k];
232                 }
233             }
234             if(tloss2 > 0)
235             {
236                 for(int k = 0; k < dim; k ++)
237                 {
238                     fori_diff[k] += -fpos[k]; // / (pairNum * 1.0 - 2.0);
239                     fpos_diff[k] += fneg[k]-fori[k]; // / (pairNum * 1.0 - 2.0);
240                     fneg_diff[k] += fpos[k];
241                 }
242             }
243 
244         }
245     }
246 
247     for (int i = 0; i < count; i ++)
248     {
249         bottom_diff[i] = bottom_diff[i] / cnt;
250     }
251 
252 }
253 
254 #ifdef CPU_ONLY
255 STUB_GPU(RankHardLossLayer);
256 #endif
257 
258 INSTANTIATE_CLASS(RankHardLossLayer);
259 REGISTER_LAYER_CLASS(RankHardLoss);
260 
261 }  // namespace caffe
View Code

  and the .cu file 

 1 #include <vector>
 2 
 3 #include "caffe/layer.hpp"
 4 #include "caffe/util/io.hpp"
 5 #include "caffe/util/math_functions.hpp"
 6 #include "caffe/vision_layers.hpp"
 7 
 8 namespace caffe {
 9 
10 template <typename Dtype>
11 void RankHardLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
12     const vector<Blob<Dtype>*>& top) {
13   Forward_cpu(bottom, top);
14 }
15 
16 template <typename Dtype>
17 void RankHardLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
18     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
19    Backward_cpu(top, propagate_down, bottom);
20 }
21 
22 INSTANTIATE_LAYER_GPU_FUNCS(RankHardLossLayer);
23 
24 }  // namespace caffe
View Code

  

  Finally, we make the caffe file and check whether have some mistakes about it.

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                   

  Let's continue to talk about the triplet loss:

  Just like the above figure showns,  the triplet loss usually have three components, i.e. the anchors, the positive, and the negative. What we are going to do is try to reduce the distance between the archor and the same, and push the different from the anchors.

  Thus, the whole loss could be described as following:

  Only select triplets randomly may lead to slow converage of the network, and we need to find those hard triplets, that are active and can therefore contribute to improving the model. The following section will give you an explanination about the approach.

  Triplet Selection:

  There are two appproaches for generate triplets, i.e.

  1. Generate triplets offline every n steps, using the most recent newwork checkpoint and computing the argmin and argmax on a subset of the data.

  2. Generate the triplets online. This can be done by selecting the hard positive/negative exemplars form within a mini-batch.

 

  This paper use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. the all anchor-positive method was more stable and converaged slightly faster at the begining of training.

 

  The code could refer the github page: https://github.com/wangxiao5791509/caffe-video_triplet

  

layer {     
    name: "loss"    
    type: "RankHardLoss"    
    rank_param{     
        neg_num: 4  
        pair_size: 2    
        hard_ratio: 0.5     
        rand_ratio: 0.5     
        margin: 1   
    }   
    bottom: "norml2"    
    bottom: "label"     
}
AI 代码解读

 

 

  

 

 

 

 

 

 

 

 

 

 

  

   

 

相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
目录
打赏
0
0
0
0
8
分享
相关文章
Hierarchical Attention-Based Age Estimation and Bias Analysis
【6月更文挑战第8天】Hierarchical Attention-Based Age Estimation论文提出了一种深度学习方法,利用层次注意力和图像增强来估计面部年龄。通过Transformer和CNN,它学习局部特征并进行序数分类和回归,提高在CACD和MORPH II数据集上的准确性。论文还包括对种族和性别偏倚的分析。方法包括自我注意的图像嵌入和层次概率年龄回归,优化多损失函数。实验表明,该方法在RS和SE协议下表现优越,且在消融研究中验证了增强聚合和编码器设计的有效性。
71 2
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
225 1
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
56 2
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
228 0
RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level 论文解读
在文档级事件提取(DEE)任务中,事件论元总是分散在句子之间(跨句子问题),多个事件可能位于一个文档中(多事件问题)。在本文中,我们认为事件论元的关系信息对于解决上述两个问题具有重要意义,并提出了一个新的DEE框架
171 0
DeIT:Training data-efficient image transformers & distillation through attention论文解读
最近,基于注意力的神经网络被证明可以解决图像理解任务,如图像分类。这些高性能的vision transformer使用大量的计算资源来预训练了数亿张图像,从而限制了它们的应用。
599 0
【论文精读】ISBI 2022 - Retinal Vessel Segmentation with Pixel-wise Adaptive Filters
由于视网膜血管的纹理复杂和成像对比度低,导致精确的视网膜血管分割具有挑战性。以前的方法通常通过级联多个深度网络来细化分割结果
147 0
【论文精读】AAAI 2022 - OneRel Joint Entity and Relation Extraction with One Module in One Step
联合实体和关系提取是自然语言处理和知识图构建中的一项重要任务。现有的方法通常将联合提取任务分解为几个基本模块或处理步骤,以使其易于执行
271 0
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 论文解读
我们提出了SegNeXt,一种用于语义分割的简单卷积网络架构。最近的基于transformer的模型由于在编码空间信息时self-attention的效率而主导了语义分割领域。在本文中,我们证明卷积注意力是比transformer中的self-attention更有效的编码上下文信息的方法。
453 0
AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等