Person Re-Identification (ReID), a New Facial Recognition Record

简介: Person Re-Identification (ReID) has been a research focus in computer vision in recent years. It is the process of retrieving images of a person across devices based on a given image of that person.

Comparison_Between_AI_Machine_Learning_and_Deep_Learning

In recent years, increasingly mature facial recognition technologies are now able to significantly outperform humans in the ability to identify faces, and has been widely used for building such projects as "smart city" and "safe city". In practical applications, however, cameras cannot always capture clear images of faces. In addition, cameras have limited range and there are often no overlaps between the areas captured by multiple cameras in real-world scenarios.

Therefore, it becomes necessary to identify and find a person using information about his or her whole body - tracking a person across cameras by using the overall features of the person as an important supplement to facial information. Thus, scientists in the field of computer vision have gradually begun their study on "ReID" technology.

ReID Enormous Practical Significance and Reliance on Manual Labor

As its name would suggest, Person Re-Identification (ReID) is the re-identification of persons by establishing correspondence between images of persons captured by different cameras that have no overlapping views. When the areas captured by different cameras do not overlap, it will be much more difficult to perform a retrieval due to a lack of sequential information. Therefore, ReID emphasizes the retrieval of a specific person in videos captured by different cameras.

1

ReID compares the features of a person from an image with the features of another person in different images, and determines whether they are the same person.

If person detection is to determine whether there is a person in an image, then ReID requires a machine to recognize all images of a particular person shot by different cameras. Specifically, it is a person comparison technology implemented based on the overall features of a person by finding one or more images of a person based on a given image of such person.

ReID has wide applications in criminal investigation in public security and image retrieval. In addition, ReID can help mobile phone users achieve image clustering and help retailers and supermarkets get customer trajectories and create commercial value. However, the precision of ReID is not high enough to be profitable currently, and much work still needs to be done manually.

Breaking the Industry Record of ReID and Surpassing Human Experts for the First Time

Research on ReID is very challenging due to the uncertainty in time and location when images are captured. In addition, different lighting, angles, and gestures, as well as occlusion strongly affects detection accuracy.

Thanks to the development of deep learning in recent years, ReID has become technologically mature. For the two most commonly used ReID test sets, Market1501 and CUHK03, the rank-1 identification accuracy have reached 89.9% and 91.8% respectively.

However, there still is a gap between the results and those achievable by human beings. Experiments show that the rank-1 accuracy of a skilled labeler on Market1501 and CUHK03 can reach 93.5% and 95.7%, respectively.

2

In order to test the ReID ability of human beings, the researchers assembled 10 professional labelers to carry out the test. Experiments show that the detection accuracy of a skilled labeler on Market1501 and CUHK03 can reach 93.5% and 95.7%, respectively. This is an exciting result that the current ReID method cannot achieve.

Not too long ago, Face++ (Megvii) made an exciting progress in this research: In an article titled "AlignedReID" published by the research team from the institute of Megvii, the authors proposed a new approach characterized by Dynamic Alignment, Mutual Learning and Re-Ranking, which made the rank-1 accuracy of the machines on Market1501 and CUHK03 reach 94.0% and 96.1% respectively. This is also the first time that machines have outperformed human experts in ReID, setting a record in the industry.

3

Machines have surpassed human beings in the more complex field of ReID in addition to facial recognition! This offers powerful technology that can be used to comprehend human images or videos.

Sun Jian, chief scientist and head of Megvii, said: "With the revival of deep learning methods in recent years, we have seen that machines have surpassed human beings in solving more and more image perception issues, from facial recognition in 2014 to ImageNet image classification in 2015. I remember that, not long ago, when I talked with my mentor, Dr. Shen Xiangyang (former Global Executive Vice President of Microsoft), I boasted that most perception issues would be resolved in 5-10 years. Today, I am very pleased to see another image perception issue, which is difficult and has great potential for application, has been solved by the algorithm developed by the Megvii team."

Multiple Networks Automatically Learning the Alignment of Human Features and Learning from Each Other

4

So how did the author achieved this?
Similar to other ReID methods based on deep learning, the author also used a deep convolutional neural network to extract features and used Triplet Loss after Hard Sample Mining as the loss function, and took the Euclidean distance of features as the similarity of two images.

5

The difference is that the author considered the alignment of the human body when studying the similarity of images. Although, some people had considered this before, such as dividing the human body into the head, the torso, and the legs, or performing an estimation based on the human skeleton, and performing an alignment based on the information about the skeleton. However, the latter approach introduced another difficult issue or required additional tagging. The idea of the author of AlignedReID [1] is to introduce an end-to-end approach that allows the network to automatically learn how to align the human body to improve performance.

In AlignedReID, deep convolutional neural networks extract both global features and local information. The distance between any pair of local information in two images is calculated to generate a distance matrix. Then the shortest path from the upper left corner to the lower right corner of the matrix is calculated through dynamic programming. An edge of the shortest path corresponds to the matching of a pair of local features, which gives a way of aligning the human body. The total distance of this alignment is the shortest when ensuring the relative order of the different parts of the body. During training, the length of the shortest path is added to the loss function to aid in the study of the overall features of a person.

6

As shown in the figure, some edges of this shortest path are redundant, such as the first edge in the figure. Why not just look for those matching edges? The author explained, "Local information should not only be matched, the alignment of the entire human body should also be taken into account. In order to match the human body from the head to the foot, it is necessary to have some redundant matchings. In addition, these redundant matchings contribute little to the length of the entire shortest path by designing the local distance function.

In addition to auto-aligning the body structure during training, the author also mentioned that the precision of the model can be effectively improved by training the two networks simultaneously and making them learn from each other. This training method is common in the classification issue, and the author made some improvements so that it can be applied to Metric Learning.

7

In the training process shown in the figure above, both networks trained at the same time include a branch for classification and a branch for Metric Learning. The two branches for classification learn from each other through KL divergence; the two branches for Metric Learning learn from each other through the metric mutual loss proposed by the author. As mentioned above, the branch for metric learning consists of two sub-branches, the sub-branch for global features and the sub-branch for local features. Interestingly, once training is completed, both the sub-branches for classification and local features will be discarded, and only the branch for global features will be retained for ReID. In other words, both person classification and the study of local features through the alignment of the human body aim to better get the global features of images.

Finally, the author also used the k-reciprocal encoding proposed in for reordering.

Conclusion

8

The first line in the figure above show the persons to be looked for. The bottom rows are results produced by a human tester and by the machine. Which row of images corresponds to result generated by the machine? (The answers will be revealed at the end of this article)

The approach presented in this article allows ReID technology to show better performance. However, at the end of this article, the author also pointed out that although machines outperform human beings in the two common datasets, it cannot be concluded that the task of ReID has been well resolved. In practical applications, human testers, especially those who are professionally trained, can be more accurate in evaluating images with crowds or in a dim environment. This is typically based on experience, intuition, the environment, and context. Therefore, people still have great advantages over machines in extreme conditions. In future practice, more efforts are needed to address and implement ReID.

Zhang Chi, one of the authors of AlignedReID, said: "When we started to study ReID in 2016, a rank-1 accuracy of 60% could be considered state of the art. However, businesses normally require an accuracy of at least 90% or higher for it to be practical. Even though we have outperformed human beings in the two common datasets, but this is just our first step towards real-world application. There will be many challenges to deal with in real-world scenarios. We hope that, with its development, ReID technology can make our society safer and smarter."

Finally, let's announce the answer to the previous question. The third row shows the results generated by the machine.

目录
相关文章
|
机器学习/深度学习 存储 自然语言处理
《a gift from knowledge distillation》
翻译:《a gift from knowledge distillation》
|
机器学习/深度学习 编解码 人工智能
Reading Notes: Human-Computer Interaction System: A Survey of Talking-Head Generation
由于人工智能的快速发展,虚拟人被广泛应用于各种行业,包括个人辅助、智能客户服务和在线教育。拟人化的数字人可以快速与人接触,并在人机交互中增强用户体验。因此,我们设计了人机交互系统框架,包括语音识别、文本到语音、对话系统和虚拟人生成。接下来,我们通过虚拟人深度生成框架对Talking-Head Generation视频生成模型进行了分类。同时,我们系统地回顾了过去五年来在有声头部视频生成方面的技术进步和趋势,强调了关键工作并总结了数据集。 对于有关于Talking-Head Generation的方法,这是一篇比较好的综述,我想着整理一下里面比较重要的部分,大概了解近几年对虚拟人工作的一些发展和
|
6月前
|
机器学习/深度学习 自然语言处理
[Big Bird]论文解读:Big Bird: Transformers for Longer Sequences
[Big Bird]论文解读:Big Bird: Transformers for Longer Sequences
67 1
|
6月前
|
机器学习/深度学习 数据挖掘 Python
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
47 0
|
6月前
|
自然语言处理 算法 Python
[SentencePiece]论文解读:SentencePiece: A simple and language independent subword tokenizer...
[SentencePiece]论文解读:SentencePiece: A simple and language independent subword tokenizer...
85 0
|
算法 计算机视觉 知识图谱
ACL2022:A Simple yet Effective Relation Information Guided Approach for Few-Shot Relation Extraction
少样本关系提取旨在通过在每个关系中使用几个标记的例子进行训练来预测句子中一对实体的关系。最近的一些工作引入了关系信息
126 0
|
机器学习/深度学习 人工智能 自然语言处理
【论文精读】AAAI 2022 - Unified Named Entity Recognition as Word-Word Relation Classification
到目前为止,命名实体识别(NER)已经涉及三种主要类型,包括扁平、重叠(又名嵌套)和不连续NER,它们大多是单独研究的。
242 0
【论文精读】AAAI 2022 - Unified Named Entity Recognition as Word-Word Relation Classification
|
机器学习/深度学习 自然语言处理 算法
TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking 论文解读
近年来,从非结构化文本中提取实体和关系引起了越来越多的关注,但由于识别共享实体的重叠关系存在内在困难,因此仍然具有挑战性。先前的研究表明,联合学习可以显著提高性能。然而,它们通常涉及连续的相互关联的步骤,并存在暴露偏差的问题。
221 0
|
自然语言处理 算法 知识图谱
DEGREE: A Data-Efficient Generation-Based Event Extraction Model论文解读
事件抽取需要专家进行高质量的人工标注,这通常很昂贵。因此,学习一个仅用少数标记示例就能训练的数据高效事件抽取模型已成为一个至关重要的挑战。
159 0
|
机器学习/深度学习 存储 数据采集
DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled论文解读
我们提出了一个事件抽取框架,目的是从文档级财经新闻中抽取事件和事件提及。到目前为止,基于监督学习范式的方法在公共数据集中获得了最高的性能(如ACE 2005、KBP 2015)。这些方法严重依赖于人工标注的训练数据。
133 0
下一篇
无影云桌面