Tutorial: Triplet Loss Layer Design for CNN

简介: Tutorial:  Triplet Loss Layer Design for CNN Xiao Wang  2016.05.02     Triplet Loss Layer could be a trick for further improving the accuracy of CNN.

 

Tutorial:  Triplet Loss Layer Design for CNN

Xiao Wang  2016.05.02

 

  Triplet Loss Layer could be a trick for further improving the accuracy of CNN. Today, I will introduce the whole process, and display the code for you. This tutorial mainly from the blog: 

  http://blog.csdn.net/tangwei2014/article/details/46812153 

  http://blog.csdn.net/tangwei2014/article/details/46788025

  and the paper: <FaceNet: A Unified Embedding for Face Recognition and Clustering>.

 

  First, Let's talk about how to add the layer into caffe and make test this layer to check whether it works or not. And then, we will discuss the paper and introduce the process of how the triplet loss come from. In the new version of caffe framework, it mainly consists of these steps for add a new layer i.e. 

  step 1. add the paprameter message in the corresponding layer, which located in ./src/caffe/proto/caffe.proto ;

  step 2. add the declaration information of the layer in ./include/caffe/***layers.hpp ;

  step 3. add the corresponding .cpp and .cu files in ./src/caffe/layers/, realize the function of the new added layer;

  step 4. add test code of new added layers in ./src/caffe/gtest/, test its foreward and back propagation and its computation speed. 

 

  Let's do it step by step. 

  First, we add triplet loss layer in caffe.proto file:

  we could found that in line 101, it said: SolverParameter next available ID: 40 (last added: momentum2), thus we add the ID: 40 as the new added information :  

        message RankParameter {
        optional uint32 neg_num = 1 [default = 1];
        optional uint32 pair_size = 2 [default = 1];
        optional float hard_ratio = 3;
        optional float rand_ratio = 4;
        optional float margin = 5 [default = 0.5];
        }

 

    

    

  Second, we add the declearation information about triplet loss layer in ./include/caffe/TripletLoss_layers.hpp 

  

 

  Third, We compile the triplet loss layer of .cpp and .cu file 

  First of all is the .cpp file 

  1 #include <vector>
  2 
  3 #include <algorithm>
  4 #include <cmath>
  5 #include <cfloat>
  6 
  7 #include "caffe/layer.hpp"
  8 #include "caffe/util/io.hpp"
  9 #include "caffe/util/math_functions.hpp"
 10 #include "caffe/vision_layers.hpp"
 11 
 12 using std::max;
 13 using namespace std;
 14 using namespace cv;
 15 
 16 namespace caffe {
 17 
 18 int myrandom (int i) { return caffe_rng_rand()%i;}
 19 
 20 
 21 template <typename Dtype>
 22 void RankHardLossLayer<Dtype>::Reshape(
 23   const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 24   LossLayer<Dtype>::Reshape(bottom, top);
 25 
 26   diff_.ReshapeLike(*bottom[0]);
 27   dis_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 28   mask_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 29 }
 30 
 31 
 32 template <typename Dtype>
 33 void RankHardLossLayer<Dtype>::set_mask(const vector<Blob<Dtype>*>& bottom)
 34 {
 35 
 36     RankParameter rank_param = this->layer_param_.rank_param();
 37     int neg_num = rank_param.neg_num();
 38     int pair_size = rank_param.pair_size();
 39     float hard_ratio = rank_param.hard_ratio();
 40     float rand_ratio = rank_param.rand_ratio();
 41     float margin = rank_param.margin();
 42 
 43     int hard_num = neg_num * hard_ratio;
 44     int rand_num = neg_num * rand_ratio;
 45 
 46     const Dtype* bottom_data = bottom[0]->cpu_data();
 47     const Dtype* label = bottom[1]->cpu_data();
 48     int count = bottom[0]->count();
 49     int num = bottom[0]->num();
 50     int dim = bottom[0]->count() / bottom[0]->num();
 51     Dtype* dis_data = dis_.mutable_cpu_data();
 52     Dtype* mask_data = mask_.mutable_cpu_data();
 53 
 54     for(int i = 0; i < num * num; i ++)
 55     {
 56         dis_data[i] = 0;
 57         mask_data[i] = 0;
 58     }
 59 
 60     // calculate distance
 61     for(int i = 0; i < num; i ++)
 62     {
 63         for(int j = i + 1; j < num; j ++)
 64         {
 65             const Dtype* fea1 = bottom_data + i * dim;
 66             const Dtype* fea2 = bottom_data + j * dim;
 67             Dtype ts = 0;
 68             for(int k = 0; k < dim; k ++)
 69             {
 70               ts += (fea1[k] * fea2[k]) ;   
 71             }                                
 72             dis_data[i * num + j] = -ts;    
 73             dis_data[j * num + i] = -ts;    
 74         }
 75     }
 76 
 77     //select samples
 78 
 79     vector<pair<float, int> >negpairs;
 80     vector<int> sid1;
 81     vector<int> sid2;
 82 
 83 
 84     for(int i = 0; i < num; i += pair_size)
 85     {
 86         negpairs.clear();
 87         sid1.clear();
 88         sid2.clear();
 89         for(int j = 0; j < num; j ++)
 90         {
 91             if(label[j] == label[i])
 92                 continue;
 93             Dtype tloss = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
 94             if(tloss == 0) continue;
 95 
 96             negpairs.push_back(make_pair(dis_data[i * num + j], j));
 97         }
 98         if(negpairs.size() <= neg_num)
 99         {
100             for(int j = 0; j < negpairs.size(); j ++)
101             {
102                 int id = negpairs[j].second;
103                 mask_data[i * num + id] = 1;
104             }
105             continue;
106         }
107         sort(negpairs.begin(), negpairs.end());
108 
109         for(int j = 0; j < neg_num; j ++)
110         {
111             sid1.push_back(negpairs[j].second);
112         }
113         for(int j = neg_num; j < negpairs.size(); j ++)
114         {
115             sid2.push_back(negpairs[j].second);
116         }
117         std::random_shuffle(sid1.begin(), sid1.end(), myrandom);
118         for(int j = 0; j < min(hard_num, (int)(sid1.size()) ); j ++)
119         {
120             mask_data[i * num + sid1[j]] = 1;
121         }
122         for(int j = hard_num; j < sid1.size(); j++)
123         {
124             sid2.push_back(sid1[j]);
125         }
126         std::random_shuffle(sid2.begin(), sid2.end(), myrandom);
127         for(int j = 0; j < min( rand_num, (int)(sid2.size()) ); j ++)
128         {
129             mask_data[i * num + sid2[j]] = 1;
130         }
131 
132     }
133 
134 
135 }
136 
137 
138 
139 
140 template <typename Dtype>
141 void RankHardLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
142     const vector<Blob<Dtype>*>& top) {
143 
144     const Dtype* bottom_data = bottom[0]->cpu_data();
145     const Dtype* label = bottom[1]->cpu_data();
146     int count = bottom[0]->count();
147     int num = bottom[0]->num();
148     int dim = bottom[0]->count() / bottom[0]->num();
149 
150 
151     RankParameter rank_param = this->layer_param_.rank_param();
152     int neg_num = rank_param.neg_num();      // 4
153     int pair_size = rank_param.pair_size();  // 5
154     float hard_ratio = rank_param.hard_ratio();
155     float rand_ratio = rank_param.rand_ratio();
156     float margin = rank_param.margin();
157     Dtype* dis_data = dis_.mutable_cpu_data();
158     Dtype* mask_data = mask_.mutable_cpu_data();
159 
160     set_mask(bottom);
161     Dtype loss = 0;
162     int cnt = neg_num * num / pair_size * 2;
163 
164     for(int i = 0; i < num; i += pair_size)
165     {
166         for(int j = 0; j < num; j++)
167         {
168             if(mask_data[i * num + j] == 0) 
169                 continue;
170             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
171             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
172             loss += tloss1 + tloss2;
173         }
174     }
175 
176     loss = loss / cnt;
177     top[0]->mutable_cpu_data()[0] = loss;
178 }
179 
180 
181 
182 
183 template <typename Dtype>
184 void RankHardLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
185     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
186 
187 
188     const Dtype* bottom_data = bottom[0]->cpu_data();
189     const Dtype* label = bottom[1]->cpu_data();
190     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
191     int count = bottom[0]->count();
192     int num = bottom[0]->num();
193     int dim = bottom[0]->count() / bottom[0]->num();
194 
195     RankParameter rank_param = this->layer_param_.rank_param();
196     int neg_num = rank_param.neg_num();
197     int pair_size = rank_param.pair_size();
198     float hard_ratio = rank_param.hard_ratio();
199     float rand_ratio = rank_param.rand_ratio();
200     float margin = rank_param.margin();
201 
202     Dtype* dis_data = dis_.mutable_cpu_data();
203     Dtype* mask_data = mask_.mutable_cpu_data();
204 
205     for(int i = 0; i < count; i ++ )
206         bottom_diff[i] = 0;
207 
208     int cnt = neg_num * num / pair_size * 2;
209 
210     for(int i = 0; i < num; i += pair_size)
211     {
212         const Dtype* fori = bottom_data + i * dim;
213         const Dtype* fpos = bottom_data + (i + 1) * dim;
214 
215         Dtype* fori_diff = bottom_diff + i * dim;
216         Dtype* fpos_diff = bottom_diff + (i + 1) * dim;
217         for(int j = 0; j < num; j ++)
218         {
219             if(mask_data[i * num + j] == 0) continue;
220             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
221             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
222 
223             const Dtype* fneg = bottom_data + j * dim;
224             Dtype* fneg_diff  = bottom_diff + j * dim;
225             if(tloss1 > 0)
226             {
227                 for(int k = 0; k < dim; k ++)
228                 {
229                     fori_diff[k] += (fneg[k] - fpos[k]); // / (pairNum * 1.0 - 2.0);
230                     fpos_diff[k] += -fori[k]; // / (pairNum * 1.0 - 2.0);
231                     fneg_diff[k] +=  fori[k];
232                 }
233             }
234             if(tloss2 > 0)
235             {
236                 for(int k = 0; k < dim; k ++)
237                 {
238                     fori_diff[k] += -fpos[k]; // / (pairNum * 1.0 - 2.0);
239                     fpos_diff[k] += fneg[k]-fori[k]; // / (pairNum * 1.0 - 2.0);
240                     fneg_diff[k] += fpos[k];
241                 }
242             }
243 
244         }
245     }
246 
247     for (int i = 0; i < count; i ++)
248     {
249         bottom_diff[i] = bottom_diff[i] / cnt;
250     }
251 
252 }
253 
254 #ifdef CPU_ONLY
255 STUB_GPU(RankHardLossLayer);
256 #endif
257 
258 INSTANTIATE_CLASS(RankHardLossLayer);
259 REGISTER_LAYER_CLASS(RankHardLoss);
260 
261 }  // namespace caffe
View Code

  and the .cu file 

 1 #include <vector>
 2 
 3 #include "caffe/layer.hpp"
 4 #include "caffe/util/io.hpp"
 5 #include "caffe/util/math_functions.hpp"
 6 #include "caffe/vision_layers.hpp"
 7 
 8 namespace caffe {
 9 
10 template <typename Dtype>
11 void RankHardLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
12     const vector<Blob<Dtype>*>& top) {
13   Forward_cpu(bottom, top);
14 }
15 
16 template <typename Dtype>
17 void RankHardLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
18     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
19    Backward_cpu(top, propagate_down, bottom);
20 }
21 
22 INSTANTIATE_LAYER_GPU_FUNCS(RankHardLossLayer);
23 
24 }  // namespace caffe
View Code

  

  Finally, we make the caffe file and check whether have some mistakes about it.

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                   

  Let's continue to talk about the triplet loss:

  Just like the above figure showns,  the triplet loss usually have three components, i.e. the anchors, the positive, and the negative. What we are going to do is try to reduce the distance between the archor and the same, and push the different from the anchors.

  Thus, the whole loss could be described as following:

  Only select triplets randomly may lead to slow converage of the network, and we need to find those hard triplets, that are active and can therefore contribute to improving the model. The following section will give you an explanination about the approach.

  Triplet Selection:

  There are two appproaches for generate triplets, i.e.

  1. Generate triplets offline every n steps, using the most recent newwork checkpoint and computing the argmin and argmax on a subset of the data.

  2. Generate the triplets online. This can be done by selecting the hard positive/negative exemplars form within a mini-batch.

 

  This paper use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. the all anchor-positive method was more stable and converaged slightly faster at the begining of training.

 

  The code could refer the github page: https://github.com/wangxiao5791509/caffe-video_triplet

  

layer {     
    name: "loss"    
    type: "RankHardLoss"    
    rank_param{     
        neg_num: 4  
        pair_size: 2    
        hard_ratio: 0.5     
        rand_ratio: 0.5     
        margin: 1   
    }   
    bottom: "norml2"    
    bottom: "label"     
}

 

 

  

 

 

 

 

 

 

 

 

 

 

  

   

 

相关实践学习
基于阿里云DeepGPU实例,用AI画唯美国风少女
本实验基于阿里云DeepGPU实例,使用aiacctorch加速stable-diffusion-webui,用AI画唯美国风少女,可提升性能至高至原性能的2.6倍。
相关文章
|
1天前
|
机器学习/深度学习 数据挖掘 API
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
12 2
|
1天前
|
机器学习/深度学习 缓存 数据可视化
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
14 1
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
|
9月前
|
机器学习/深度学习 人工智能 自然语言处理
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
111 0
|
9月前
|
机器学习/深度学习 编解码 自然语言处理
DeIT:Training data-efficient image transformers & distillation through attention论文解读
最近,基于注意力的神经网络被证明可以解决图像理解任务,如图像分类。这些高性能的vision transformer使用大量的计算资源来预训练了数亿张图像,从而限制了它们的应用。
251 0
|
11月前
|
数据可视化 数据挖掘
【论文解读】Dual Contrastive Learning:Text Classification via Label-Aware Data Augmentation
北航出了一篇比较有意思的文章,使用标签感知的数据增强方式,将对比学习放置在有监督的环境中 ,下游任务为多类文本分类,在低资源环境中进行实验取得了不错的效果
248 0
|
机器学习/深度学习 存储 数据挖掘
【文本分类】Bag of Tricks for Efficient Text Classification
【文本分类】Bag of Tricks for Efficient Text Classification
【文本分类】Bag of Tricks for Efficient Text Classification
|
机器学习/深度学习 数据挖掘
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification
155 0
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification
|
机器学习/深度学习 编解码 数据挖掘
A Semisupervised CRF Model for CNN-Based Semantic Segmentation With Sparse Ground Truth
A Semisupervised CRF Model for CNN-Based Semantic Segmentation With Sparse Ground Truth
85 0
A Semisupervised CRF Model for CNN-Based Semantic Segmentation With Sparse Ground Truth
|
机器学习/深度学习 Ubuntu PyTorch
RoNIN: Robust Neural Inertial Navigation预训练模型测试
RoNIN: Robust Neural Inertial Navigation预训练模型测试
113 0
RoNIN: Robust Neural Inertial Navigation预训练模型测试
|
机器学习/深度学习 算法 数据挖掘
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift