Tutorial: Triplet Loss Layer Design for CNN

简介: Tutorial:  Triplet Loss Layer Design for CNN Xiao Wang  2016.05.02     Triplet Loss Layer could be a trick for further improving the accuracy of CNN.

 

Tutorial:  Triplet Loss Layer Design for CNN

Xiao Wang  2016.05.02

 

  Triplet Loss Layer could be a trick for further improving the accuracy of CNN. Today, I will introduce the whole process, and display the code for you. This tutorial mainly from the blog: 

  http://blog.csdn.net/tangwei2014/article/details/46812153 

  http://blog.csdn.net/tangwei2014/article/details/46788025

  and the paper: <FaceNet: A Unified Embedding for Face Recognition and Clustering>.

 

  First, Let's talk about how to add the layer into caffe and make test this layer to check whether it works or not. And then, we will discuss the paper and introduce the process of how the triplet loss come from. In the new version of caffe framework, it mainly consists of these steps for add a new layer i.e. 

  step 1. add the paprameter message in the corresponding layer, which located in ./src/caffe/proto/caffe.proto ;

  step 2. add the declaration information of the layer in ./include/caffe/***layers.hpp ;

  step 3. add the corresponding .cpp and .cu files in ./src/caffe/layers/, realize the function of the new added layer;

  step 4. add test code of new added layers in ./src/caffe/gtest/, test its foreward and back propagation and its computation speed. 

 

  Let's do it step by step. 

  First, we add triplet loss layer in caffe.proto file:

  we could found that in line 101, it said: SolverParameter next available ID: 40 (last added: momentum2), thus we add the ID: 40 as the new added information :  

        message RankParameter {
        optional uint32 neg_num = 1 [default = 1];
        optional uint32 pair_size = 2 [default = 1];
        optional float hard_ratio = 3;
        optional float rand_ratio = 4;
        optional float margin = 5 [default = 0.5];
        }

 

    

    

  Second, we add the declearation information about triplet loss layer in ./include/caffe/TripletLoss_layers.hpp 

  

 

  Third, We compile the triplet loss layer of .cpp and .cu file 

  First of all is the .cpp file 

  1 #include <vector>
  2 
  3 #include <algorithm>
  4 #include <cmath>
  5 #include <cfloat>
  6 
  7 #include "caffe/layer.hpp"
  8 #include "caffe/util/io.hpp"
  9 #include "caffe/util/math_functions.hpp"
 10 #include "caffe/vision_layers.hpp"
 11 
 12 using std::max;
 13 using namespace std;
 14 using namespace cv;
 15 
 16 namespace caffe {
 17 
 18 int myrandom (int i) { return caffe_rng_rand()%i;}
 19 
 20 
 21 template <typename Dtype>
 22 void RankHardLossLayer<Dtype>::Reshape(
 23   const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 24   LossLayer<Dtype>::Reshape(bottom, top);
 25 
 26   diff_.ReshapeLike(*bottom[0]);
 27   dis_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 28   mask_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 29 }
 30 
 31 
 32 template <typename Dtype>
 33 void RankHardLossLayer<Dtype>::set_mask(const vector<Blob<Dtype>*>& bottom)
 34 {
 35 
 36     RankParameter rank_param = this->layer_param_.rank_param();
 37     int neg_num = rank_param.neg_num();
 38     int pair_size = rank_param.pair_size();
 39     float hard_ratio = rank_param.hard_ratio();
 40     float rand_ratio = rank_param.rand_ratio();
 41     float margin = rank_param.margin();
 42 
 43     int hard_num = neg_num * hard_ratio;
 44     int rand_num = neg_num * rand_ratio;
 45 
 46     const Dtype* bottom_data = bottom[0]->cpu_data();
 47     const Dtype* label = bottom[1]->cpu_data();
 48     int count = bottom[0]->count();
 49     int num = bottom[0]->num();
 50     int dim = bottom[0]->count() / bottom[0]->num();
 51     Dtype* dis_data = dis_.mutable_cpu_data();
 52     Dtype* mask_data = mask_.mutable_cpu_data();
 53 
 54     for(int i = 0; i < num * num; i ++)
 55     {
 56         dis_data[i] = 0;
 57         mask_data[i] = 0;
 58     }
 59 
 60     // calculate distance
 61     for(int i = 0; i < num; i ++)
 62     {
 63         for(int j = i + 1; j < num; j ++)
 64         {
 65             const Dtype* fea1 = bottom_data + i * dim;
 66             const Dtype* fea2 = bottom_data + j * dim;
 67             Dtype ts = 0;
 68             for(int k = 0; k < dim; k ++)
 69             {
 70               ts += (fea1[k] * fea2[k]) ;   
 71             }                                
 72             dis_data[i * num + j] = -ts;    
 73             dis_data[j * num + i] = -ts;    
 74         }
 75     }
 76 
 77     //select samples
 78 
 79     vector<pair<float, int> >negpairs;
 80     vector<int> sid1;
 81     vector<int> sid2;
 82 
 83 
 84     for(int i = 0; i < num; i += pair_size)
 85     {
 86         negpairs.clear();
 87         sid1.clear();
 88         sid2.clear();
 89         for(int j = 0; j < num; j ++)
 90         {
 91             if(label[j] == label[i])
 92                 continue;
 93             Dtype tloss = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
 94             if(tloss == 0) continue;
 95 
 96             negpairs.push_back(make_pair(dis_data[i * num + j], j));
 97         }
 98         if(negpairs.size() <= neg_num)
 99         {
100             for(int j = 0; j < negpairs.size(); j ++)
101             {
102                 int id = negpairs[j].second;
103                 mask_data[i * num + id] = 1;
104             }
105             continue;
106         }
107         sort(negpairs.begin(), negpairs.end());
108 
109         for(int j = 0; j < neg_num; j ++)
110         {
111             sid1.push_back(negpairs[j].second);
112         }
113         for(int j = neg_num; j < negpairs.size(); j ++)
114         {
115             sid2.push_back(negpairs[j].second);
116         }
117         std::random_shuffle(sid1.begin(), sid1.end(), myrandom);
118         for(int j = 0; j < min(hard_num, (int)(sid1.size()) ); j ++)
119         {
120             mask_data[i * num + sid1[j]] = 1;
121         }
122         for(int j = hard_num; j < sid1.size(); j++)
123         {
124             sid2.push_back(sid1[j]);
125         }
126         std::random_shuffle(sid2.begin(), sid2.end(), myrandom);
127         for(int j = 0; j < min( rand_num, (int)(sid2.size()) ); j ++)
128         {
129             mask_data[i * num + sid2[j]] = 1;
130         }
131 
132     }
133 
134 
135 }
136 
137 
138 
139 
140 template <typename Dtype>
141 void RankHardLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
142     const vector<Blob<Dtype>*>& top) {
143 
144     const Dtype* bottom_data = bottom[0]->cpu_data();
145     const Dtype* label = bottom[1]->cpu_data();
146     int count = bottom[0]->count();
147     int num = bottom[0]->num();
148     int dim = bottom[0]->count() / bottom[0]->num();
149 
150 
151     RankParameter rank_param = this->layer_param_.rank_param();
152     int neg_num = rank_param.neg_num();      // 4
153     int pair_size = rank_param.pair_size();  // 5
154     float hard_ratio = rank_param.hard_ratio();
155     float rand_ratio = rank_param.rand_ratio();
156     float margin = rank_param.margin();
157     Dtype* dis_data = dis_.mutable_cpu_data();
158     Dtype* mask_data = mask_.mutable_cpu_data();
159 
160     set_mask(bottom);
161     Dtype loss = 0;
162     int cnt = neg_num * num / pair_size * 2;
163 
164     for(int i = 0; i < num; i += pair_size)
165     {
166         for(int j = 0; j < num; j++)
167         {
168             if(mask_data[i * num + j] == 0) 
169                 continue;
170             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
171             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
172             loss += tloss1 + tloss2;
173         }
174     }
175 
176     loss = loss / cnt;
177     top[0]->mutable_cpu_data()[0] = loss;
178 }
179 
180 
181 
182 
183 template <typename Dtype>
184 void RankHardLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
185     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
186 
187 
188     const Dtype* bottom_data = bottom[0]->cpu_data();
189     const Dtype* label = bottom[1]->cpu_data();
190     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
191     int count = bottom[0]->count();
192     int num = bottom[0]->num();
193     int dim = bottom[0]->count() / bottom[0]->num();
194 
195     RankParameter rank_param = this->layer_param_.rank_param();
196     int neg_num = rank_param.neg_num();
197     int pair_size = rank_param.pair_size();
198     float hard_ratio = rank_param.hard_ratio();
199     float rand_ratio = rank_param.rand_ratio();
200     float margin = rank_param.margin();
201 
202     Dtype* dis_data = dis_.mutable_cpu_data();
203     Dtype* mask_data = mask_.mutable_cpu_data();
204 
205     for(int i = 0; i < count; i ++ )
206         bottom_diff[i] = 0;
207 
208     int cnt = neg_num * num / pair_size * 2;
209 
210     for(int i = 0; i < num; i += pair_size)
211     {
212         const Dtype* fori = bottom_data + i * dim;
213         const Dtype* fpos = bottom_data + (i + 1) * dim;
214 
215         Dtype* fori_diff = bottom_diff + i * dim;
216         Dtype* fpos_diff = bottom_diff + (i + 1) * dim;
217         for(int j = 0; j < num; j ++)
218         {
219             if(mask_data[i * num + j] == 0) continue;
220             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
221             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
222 
223             const Dtype* fneg = bottom_data + j * dim;
224             Dtype* fneg_diff  = bottom_diff + j * dim;
225             if(tloss1 > 0)
226             {
227                 for(int k = 0; k < dim; k ++)
228                 {
229                     fori_diff[k] += (fneg[k] - fpos[k]); // / (pairNum * 1.0 - 2.0);
230                     fpos_diff[k] += -fori[k]; // / (pairNum * 1.0 - 2.0);
231                     fneg_diff[k] +=  fori[k];
232                 }
233             }
234             if(tloss2 > 0)
235             {
236                 for(int k = 0; k < dim; k ++)
237                 {
238                     fori_diff[k] += -fpos[k]; // / (pairNum * 1.0 - 2.0);
239                     fpos_diff[k] += fneg[k]-fori[k]; // / (pairNum * 1.0 - 2.0);
240                     fneg_diff[k] += fpos[k];
241                 }
242             }
243 
244         }
245     }
246 
247     for (int i = 0; i < count; i ++)
248     {
249         bottom_diff[i] = bottom_diff[i] / cnt;
250     }
251 
252 }
253 
254 #ifdef CPU_ONLY
255 STUB_GPU(RankHardLossLayer);
256 #endif
257 
258 INSTANTIATE_CLASS(RankHardLossLayer);
259 REGISTER_LAYER_CLASS(RankHardLoss);
260 
261 }  // namespace caffe
View Code

  and the .cu file 

 1 #include <vector>
 2 
 3 #include "caffe/layer.hpp"
 4 #include "caffe/util/io.hpp"
 5 #include "caffe/util/math_functions.hpp"
 6 #include "caffe/vision_layers.hpp"
 7 
 8 namespace caffe {
 9 
10 template <typename Dtype>
11 void RankHardLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
12     const vector<Blob<Dtype>*>& top) {
13   Forward_cpu(bottom, top);
14 }
15 
16 template <typename Dtype>
17 void RankHardLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
18     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
19    Backward_cpu(top, propagate_down, bottom);
20 }
21 
22 INSTANTIATE_LAYER_GPU_FUNCS(RankHardLossLayer);
23 
24 }  // namespace caffe
View Code

  

  Finally, we make the caffe file and check whether have some mistakes about it.

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                   

  Let's continue to talk about the triplet loss:

  Just like the above figure showns,  the triplet loss usually have three components, i.e. the anchors, the positive, and the negative. What we are going to do is try to reduce the distance between the archor and the same, and push the different from the anchors.

  Thus, the whole loss could be described as following:

  Only select triplets randomly may lead to slow converage of the network, and we need to find those hard triplets, that are active and can therefore contribute to improving the model. The following section will give you an explanination about the approach.

  Triplet Selection:

  There are two appproaches for generate triplets, i.e.

  1. Generate triplets offline every n steps, using the most recent newwork checkpoint and computing the argmin and argmax on a subset of the data.

  2. Generate the triplets online. This can be done by selecting the hard positive/negative exemplars form within a mini-batch.

 

  This paper use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. the all anchor-positive method was more stable and converaged slightly faster at the begining of training.

 

  The code could refer the github page: https://github.com/wangxiao5791509/caffe-video_triplet

  

layer {     
    name: "loss"    
    type: "RankHardLoss"    
    rank_param{     
        neg_num: 4  
        pair_size: 2    
        hard_ratio: 0.5     
        rand_ratio: 0.5     
        margin: 1   
    }   
    bottom: "norml2"    
    bottom: "label"     
}

 

 

  

 

 

 

 

 

 

 

 

 

 

  

   

 

相关实践学习
部署Stable Diffusion玩转AI绘画(GPU云服务器)
本实验通过在ECS上从零开始部署Stable Diffusion来进行AI绘画创作,开启AIGC盲盒。
相关文章
|
数据采集 自然语言处理 数据可视化
Hidden Markov Model,简称 HMM
隐马尔可夫模型(Hidden Markov Model,简称 HMM)是一种统计模型,用于描述由隐藏的马尔可夫链随机生成观测序列的过程。它是一种生成模型,可以通过学习模型参数来预测观测序列的未来状态。HMM 主要包括以下几个步骤:
85 5
|
5月前
|
机器学习/深度学习 算法 关系型数据库
Hierarchical Attention-Based Age Estimation and Bias Analysis
【6月更文挑战第8天】Hierarchical Attention-Based Age Estimation论文提出了一种深度学习方法,利用层次注意力和图像增强来估计面部年龄。通过Transformer和CNN,它学习局部特征并进行序数分类和回归,提高在CACD和MORPH II数据集上的准确性。论文还包括对种族和性别偏倚的分析。方法包括自我注意的图像嵌入和层次概率年龄回归,优化多损失函数。实验表明,该方法在RS和SE协议下表现优越,且在消融研究中验证了增强聚合和编码器设计的有效性。
36 2
|
6月前
|
机器学习/深度学习 缓存 数据可视化
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
101 1
[Linformer]论文实现:Linformer: Self-Attention with Linear Complexity
|
6月前
|
机器学习/深度学习 数据挖掘 API
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
[FastText in Text Classification]论文实现:Bag of Tricks for Efficient Text Classification
36 2
|
自然语言处理 算法
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
在社交媒体上,面临着大量的知识和信息,一个有效的关键词抽取算法可以广泛地被应用的信息检索和自然语言处理中。传统的关键词抽取算法很难使用外部的知识信息。
156 0
SIFRank New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model
|
机器学习/深度学习 人工智能 自然语言处理
OneIE:A Joint Neural Model for Information Extraction with Global Features论文解读
大多数现有的用于信息抽取(IE)的联合神经网络模型使用局部任务特定的分类器来预测单个实例(例如,触发词,关系)的标签,而不管它们之间的交互。
175 0
|
机器学习/深度学习 算法框架/工具 C++
Caffe(Convolutional Architecture for Fast Feature Embedding)
Caffe(Convolutional Architecture for Fast Feature Embedding)是一个流行的深度学习框架,主要用于图像分类、物体检测和语义分割等计算机视觉任务。它由Berkeley Vision and Learning Center(BVLC)开发,使用C++编写,提供了高效的神经网络实现和训练工具。
185 1
|
机器学习/深度学习 编解码 自然语言处理
DeIT:Training data-efficient image transformers & distillation through attention论文解读
最近,基于注意力的神经网络被证明可以解决图像理解任务,如图像分类。这些高性能的vision transformer使用大量的计算资源来预训练了数亿张图像,从而限制了它们的应用。
518 0
|
数据可视化 数据挖掘
【论文解读】Dual Contrastive Learning:Text Classification via Label-Aware Data Augmentation
北航出了一篇比较有意思的文章,使用标签感知的数据增强方式,将对比学习放置在有监督的环境中 ,下游任务为多类文本分类,在低资源环境中进行实验取得了不错的效果
390 0
|
机器学习/深度学习 数据挖掘
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification
193 0
【文本分类】ACT: an Attentive Convolutional Transformer for Efficient Text Classification