Tutorial: Triplet Loss Layer Design for CNN

简介: Tutorial:  Triplet Loss Layer Design for CNN Xiao Wang  2016.05.02     Triplet Loss Layer could be a trick for further improving the accuracy of CNN.

 

Tutorial:  Triplet Loss Layer Design for CNN

Xiao Wang  2016.05.02

 

  Triplet Loss Layer could be a trick for further improving the accuracy of CNN. Today, I will introduce the whole process, and display the code for you. This tutorial mainly from the blog: 

  http://blog.csdn.net/tangwei2014/article/details/46812153 

  http://blog.csdn.net/tangwei2014/article/details/46788025

  and the paper: <FaceNet: A Unified Embedding for Face Recognition and Clustering>.

 

  First, Let's talk about how to add the layer into caffe and make test this layer to check whether it works or not. And then, we will discuss the paper and introduce the process of how the triplet loss come from. In the new version of caffe framework, it mainly consists of these steps for add a new layer i.e. 

  step 1. add the paprameter message in the corresponding layer, which located in ./src/caffe/proto/caffe.proto ;

  step 2. add the declaration information of the layer in ./include/caffe/***layers.hpp ;

  step 3. add the corresponding .cpp and .cu files in ./src/caffe/layers/, realize the function of the new added layer;

  step 4. add test code of new added layers in ./src/caffe/gtest/, test its foreward and back propagation and its computation speed. 

 

  Let's do it step by step. 

  First, we add triplet loss layer in caffe.proto file:

  we could found that in line 101, it said: SolverParameter next available ID: 40 (last added: momentum2), thus we add the ID: 40 as the new added information :  

        message RankParameter {
        optional uint32 neg_num = 1 [default = 1];
        optional uint32 pair_size = 2 [default = 1];
        optional float hard_ratio = 3;
        optional float rand_ratio = 4;
        optional float margin = 5 [default = 0.5];
        }

 

    

    

  Second, we add the declearation information about triplet loss layer in ./include/caffe/TripletLoss_layers.hpp 

  

 

  Third, We compile the triplet loss layer of .cpp and .cu file 

  First of all is the .cpp file 

  1 #include <vector>
  2 
  3 #include <algorithm>
  4 #include <cmath>
  5 #include <cfloat>
  6 
  7 #include "caffe/layer.hpp"
  8 #include "caffe/util/io.hpp"
  9 #include "caffe/util/math_functions.hpp"
 10 #include "caffe/vision_layers.hpp"
 11 
 12 using std::max;
 13 using namespace std;
 14 using namespace cv;
 15 
 16 namespace caffe {
 17 
 18 int myrandom (int i) { return caffe_rng_rand()%i;}
 19 
 20 
 21 template <typename Dtype>
 22 void RankHardLossLayer<Dtype>::Reshape(
 23   const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
 24   LossLayer<Dtype>::Reshape(bottom, top);
 25 
 26   diff_.ReshapeLike(*bottom[0]);
 27   dis_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 28   mask_.Reshape(bottom[0]->num(), bottom[0]->num(), 1, 1);
 29 }
 30 
 31 
 32 template <typename Dtype>
 33 void RankHardLossLayer<Dtype>::set_mask(const vector<Blob<Dtype>*>& bottom)
 34 {
 35 
 36     RankParameter rank_param = this->layer_param_.rank_param();
 37     int neg_num = rank_param.neg_num();
 38     int pair_size = rank_param.pair_size();
 39     float hard_ratio = rank_param.hard_ratio();
 40     float rand_ratio = rank_param.rand_ratio();
 41     float margin = rank_param.margin();
 42 
 43     int hard_num = neg_num * hard_ratio;
 44     int rand_num = neg_num * rand_ratio;
 45 
 46     const Dtype* bottom_data = bottom[0]->cpu_data();
 47     const Dtype* label = bottom[1]->cpu_data();
 48     int count = bottom[0]->count();
 49     int num = bottom[0]->num();
 50     int dim = bottom[0]->count() / bottom[0]->num();
 51     Dtype* dis_data = dis_.mutable_cpu_data();
 52     Dtype* mask_data = mask_.mutable_cpu_data();
 53 
 54     for(int i = 0; i < num * num; i ++)
 55     {
 56         dis_data[i] = 0;
 57         mask_data[i] = 0;
 58     }
 59 
 60     // calculate distance
 61     for(int i = 0; i < num; i ++)
 62     {
 63         for(int j = i + 1; j < num; j ++)
 64         {
 65             const Dtype* fea1 = bottom_data + i * dim;
 66             const Dtype* fea2 = bottom_data + j * dim;
 67             Dtype ts = 0;
 68             for(int k = 0; k < dim; k ++)
 69             {
 70               ts += (fea1[k] * fea2[k]) ;   
 71             }                                
 72             dis_data[i * num + j] = -ts;    
 73             dis_data[j * num + i] = -ts;    
 74         }
 75     }
 76 
 77     //select samples
 78 
 79     vector<pair<float, int> >negpairs;
 80     vector<int> sid1;
 81     vector<int> sid2;
 82 
 83 
 84     for(int i = 0; i < num; i += pair_size)
 85     {
 86         negpairs.clear();
 87         sid1.clear();
 88         sid2.clear();
 89         for(int j = 0; j < num; j ++)
 90         {
 91             if(label[j] == label[i])
 92                 continue;
 93             Dtype tloss = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
 94             if(tloss == 0) continue;
 95 
 96             negpairs.push_back(make_pair(dis_data[i * num + j], j));
 97         }
 98         if(negpairs.size() <= neg_num)
 99         {
100             for(int j = 0; j < negpairs.size(); j ++)
101             {
102                 int id = negpairs[j].second;
103                 mask_data[i * num + id] = 1;
104             }
105             continue;
106         }
107         sort(negpairs.begin(), negpairs.end());
108 
109         for(int j = 0; j < neg_num; j ++)
110         {
111             sid1.push_back(negpairs[j].second);
112         }
113         for(int j = neg_num; j < negpairs.size(); j ++)
114         {
115             sid2.push_back(negpairs[j].second);
116         }
117         std::random_shuffle(sid1.begin(), sid1.end(), myrandom);
118         for(int j = 0; j < min(hard_num, (int)(sid1.size()) ); j ++)
119         {
120             mask_data[i * num + sid1[j]] = 1;
121         }
122         for(int j = hard_num; j < sid1.size(); j++)
123         {
124             sid2.push_back(sid1[j]);
125         }
126         std::random_shuffle(sid2.begin(), sid2.end(), myrandom);
127         for(int j = 0; j < min( rand_num, (int)(sid2.size()) ); j ++)
128         {
129             mask_data[i * num + sid2[j]] = 1;
130         }
131 
132     }
133 
134 
135 }
136 
137 
138 
139 
140 template <typename Dtype>
141 void RankHardLossLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
142     const vector<Blob<Dtype>*>& top) {
143 
144     const Dtype* bottom_data = bottom[0]->cpu_data();
145     const Dtype* label = bottom[1]->cpu_data();
146     int count = bottom[0]->count();
147     int num = bottom[0]->num();
148     int dim = bottom[0]->count() / bottom[0]->num();
149 
150 
151     RankParameter rank_param = this->layer_param_.rank_param();
152     int neg_num = rank_param.neg_num();      // 4
153     int pair_size = rank_param.pair_size();  // 5
154     float hard_ratio = rank_param.hard_ratio();
155     float rand_ratio = rank_param.rand_ratio();
156     float margin = rank_param.margin();
157     Dtype* dis_data = dis_.mutable_cpu_data();
158     Dtype* mask_data = mask_.mutable_cpu_data();
159 
160     set_mask(bottom);
161     Dtype loss = 0;
162     int cnt = neg_num * num / pair_size * 2;
163 
164     for(int i = 0; i < num; i += pair_size)
165     {
166         for(int j = 0; j < num; j++)
167         {
168             if(mask_data[i * num + j] == 0) 
169                 continue;
170             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
171             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
172             loss += tloss1 + tloss2;
173         }
174     }
175 
176     loss = loss / cnt;
177     top[0]->mutable_cpu_data()[0] = loss;
178 }
179 
180 
181 
182 
183 template <typename Dtype>
184 void RankHardLossLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
185     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
186 
187 
188     const Dtype* bottom_data = bottom[0]->cpu_data();
189     const Dtype* label = bottom[1]->cpu_data();
190     Dtype* bottom_diff = bottom[0]->mutable_cpu_diff();
191     int count = bottom[0]->count();
192     int num = bottom[0]->num();
193     int dim = bottom[0]->count() / bottom[0]->num();
194 
195     RankParameter rank_param = this->layer_param_.rank_param();
196     int neg_num = rank_param.neg_num();
197     int pair_size = rank_param.pair_size();
198     float hard_ratio = rank_param.hard_ratio();
199     float rand_ratio = rank_param.rand_ratio();
200     float margin = rank_param.margin();
201 
202     Dtype* dis_data = dis_.mutable_cpu_data();
203     Dtype* mask_data = mask_.mutable_cpu_data();
204 
205     for(int i = 0; i < count; i ++ )
206         bottom_diff[i] = 0;
207 
208     int cnt = neg_num * num / pair_size * 2;
209 
210     for(int i = 0; i < num; i += pair_size)
211     {
212         const Dtype* fori = bottom_data + i * dim;
213         const Dtype* fpos = bottom_data + (i + 1) * dim;
214 
215         Dtype* fori_diff = bottom_diff + i * dim;
216         Dtype* fpos_diff = bottom_diff + (i + 1) * dim;
217         for(int j = 0; j < num; j ++)
218         {
219             if(mask_data[i * num + j] == 0) continue;
220             Dtype tloss1 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[i * num + j] + Dtype(margin));
221             Dtype tloss2 = max(Dtype(0), dis_data[i * num + i + 1] - dis_data[(i + 1) * num + j] + Dtype(margin));
222 
223             const Dtype* fneg = bottom_data + j * dim;
224             Dtype* fneg_diff  = bottom_diff + j * dim;
225             if(tloss1 > 0)
226             {
227                 for(int k = 0; k < dim; k ++)
228                 {
229                     fori_diff[k] += (fneg[k] - fpos[k]); // / (pairNum * 1.0 - 2.0);
230                     fpos_diff[k] += -fori[k]; // / (pairNum * 1.0 - 2.0);
231                     fneg_diff[k] +=  fori[k];
232                 }
233             }
234             if(tloss2 > 0)
235             {
236                 for(int k = 0; k < dim; k ++)
237                 {
238                     fori_diff[k] += -fpos[k]; // / (pairNum * 1.0 - 2.0);
239                     fpos_diff[k] += fneg[k]-fori[k]; // / (pairNum * 1.0 - 2.0);
240                     fneg_diff[k] += fpos[k];
241                 }
242             }
243 
244         }
245     }
246 
247     for (int i = 0; i < count; i ++)
248     {
249         bottom_diff[i] = bottom_diff[i] / cnt;
250     }
251 
252 }
253 
254 #ifdef CPU_ONLY
255 STUB_GPU(RankHardLossLayer);
256 #endif
257 
258 INSTANTIATE_CLASS(RankHardLossLayer);
259 REGISTER_LAYER_CLASS(RankHardLoss);
260 
261 }  // namespace caffe
View Code

  and the .cu file 

 1 #include <vector>
 2 
 3 #include "caffe/layer.hpp"
 4 #include "caffe/util/io.hpp"
 5 #include "caffe/util/math_functions.hpp"
 6 #include "caffe/vision_layers.hpp"
 7 
 8 namespace caffe {
 9 
10 template <typename Dtype>
11 void RankHardLossLayer<Dtype>::Forward_gpu(const vector<Blob<Dtype>*>& bottom,
12     const vector<Blob<Dtype>*>& top) {
13   Forward_cpu(bottom, top);
14 }
15 
16 template <typename Dtype>
17 void RankHardLossLayer<Dtype>::Backward_gpu(const vector<Blob<Dtype>*>& top,
18     const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
19    Backward_cpu(top, propagate_down, bottom);
20 }
21 
22 INSTANTIATE_LAYER_GPU_FUNCS(RankHardLossLayer);
23 
24 }  // namespace caffe
View Code

  

  Finally, we make the caffe file and check whether have some mistakes about it.

 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                   

  Let's continue to talk about the triplet loss:

  Just like the above figure showns,  the triplet loss usually have three components, i.e. the anchors, the positive, and the negative. What we are going to do is try to reduce the distance between the archor and the same, and push the different from the anchors.

  Thus, the whole loss could be described as following:

  Only select triplets randomly may lead to slow converage of the network, and we need to find those hard triplets, that are active and can therefore contribute to improving the model. The following section will give you an explanination about the approach.

  Triplet Selection:

  There are two appproaches for generate triplets, i.e.

  1. Generate triplets offline every n steps, using the most recent newwork checkpoint and computing the argmin and argmax on a subset of the data.

  2. Generate the triplets online. This can be done by selecting the hard positive/negative exemplars form within a mini-batch.

 

  This paper use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. the all anchor-positive method was more stable and converaged slightly faster at the begining of training.

 

  The code could refer the github page: https://github.com/wangxiao5791509/caffe-video_triplet

  

layer {     
    name: "loss"    
    type: "RankHardLoss"    
    rank_param{     
        neg_num: 4  
        pair_size: 2    
        hard_ratio: 0.5     
        rand_ratio: 0.5     
        margin: 1   
    }   
    bottom: "norml2"    
    bottom: "label"     
}

 

 

  

 

 

 

 

 

 

 

 

 

 

  

   

 

相关实践学习
在云上部署ChatGLM2-6B大模型(GPU版)
ChatGLM2-6B是由智谱AI及清华KEG实验室于2023年6月发布的中英双语对话开源大模型。通过本实验,可以学习如何配置AIGC开发环境,如何部署ChatGLM2-6B大模型。
相关文章
|
存储 Shell Linux
【Shell 命令集合 文件传输 FTP客户端工具】Linux ncftp 命令使用指南
【Shell 命令集合 文件传输 FTP客户端工具】Linux ncftp 命令使用指南
467 0
|
存储 运维 负载均衡
分区存储
分区存储
357 0
|
存储 计算机视觉
机器视觉-边缘检测与图像分割原理
机器视觉-边缘检测与图像分割原理
447 0
|
监控 自动驾驶 数据可视化
GPS 人员定位系统:原理拆解 + 优劣分析 + 避坑指南(一)
本文详解GPS人员定位系统的工作原理,从卫星测距、三角定位到数据解算,解析其实现过程,并介绍实时定位、电子点名、轨迹回溯、智能考勤等核心功能,为后续优劣分析与避坑指南做铺垫。
|
8月前
|
JSON 前端开发 NoSQL
如何开发OA管理系统的日报、周报管理板块?(附架构图+流程图+代码参考)
本文详解如何将日报/周报模块深度集成至人事OA系统,涵盖需求分析、系统架构、数据模型、业务流程、开发技巧及运维部署等全流程方案。重点阐述结构化数据采集、自动化提醒、审批闭环设计等核心功能,并提供关键代码示例,助力企业高效落地日报/周报系统,提升组织协同效率。
|
小程序 数据安全/隐私保护 UED
创新陪玩代练模式:小程序源码软件开发的挑战与机遇
随着互联网娱乐行业的蓬勃发展,陪玩代练服务成为新兴休闲娱乐方式,满足玩家在游戏、学习、健身等多方面的陪伴与指导需求。小程序作为连接玩家与陪玩人员的平台,面临技术实现难度(如实时语音通信、高并发处理)、数据安全、功能多样化及用户体验优化等挑战,同时也孕育着巨大机遇。市场竞争激烈,小程序需具备独特卖点以脱颖而出。
372 0
创新陪玩代练模式:小程序源码软件开发的挑战与机遇
|
开发框架 JavaScript 前端开发
提升生产力:8个.NET开源且功能强大的快速开发框架
提升生产力:8个.NET开源且功能强大的快速开发框架
851 3
|
安全 5G 网络安全
什么是 Wi-Fi 热点?
【8月更文挑战第24天】
4471 0
|
存储 边缘计算 安全
边缘计算与云计算的协同工作:技术解析与应用前景
【8月更文挑战第4天】边缘计算与云计算的协同工作是未来信息技术发展的重要趋势。通过合理地分配任务和资源,两者可以相互补充、发挥各自的优势,实现更高效的数据处理和分析。这种协同模式不仅提高了系统的性能和可靠性,还满足了复杂多变的应用需求,为人工智能、物联网等技术的发展提供了强有力的支持。
1252 11
|
Java Linux Apache
jar 解压app.jar到指定文件夹
要将 JAR 文件(如 `app.jar`)解压到指定文件夹,可使用 Java 自带的 `jar` 工具、Apache Ant、7-Zip 或 Python 脚本。方法包括命令行操作(如 `jar xf app.jar -C /path/to/destination/folder`)、Ant 构建文件、7-Zip 图形界面或命令行,以及 Python 的 `zipfile` 模块。选择适合的方法即可轻松完成解压。
1867 3

热门文章

最新文章