Person Re-Identification (ReID), a New Facial Recognition Record

简介: Person Re-Identification (ReID) has been a research focus in computer vision in recent years. It is the process of retrieving images of a person across devices based on a given image of that person.

Comparison_Between_AI_Machine_Learning_and_Deep_Learning

In recent years, increasingly mature facial recognition technologies are now able to significantly outperform humans in the ability to identify faces, and has been widely used for building such projects as "smart city" and "safe city". In practical applications, however, cameras cannot always capture clear images of faces. In addition, cameras have limited range and there are often no overlaps between the areas captured by multiple cameras in real-world scenarios.

Therefore, it becomes necessary to identify and find a person using information about his or her whole body - tracking a person across cameras by using the overall features of the person as an important supplement to facial information. Thus, scientists in the field of computer vision have gradually begun their study on "ReID" technology.

ReID Enormous Practical Significance and Reliance on Manual Labor

As its name would suggest, Person Re-Identification (ReID) is the re-identification of persons by establishing correspondence between images of persons captured by different cameras that have no overlapping views. When the areas captured by different cameras do not overlap, it will be much more difficult to perform a retrieval due to a lack of sequential information. Therefore, ReID emphasizes the retrieval of a specific person in videos captured by different cameras.

1

ReID compares the features of a person from an image with the features of another person in different images, and determines whether they are the same person.

If person detection is to determine whether there is a person in an image, then ReID requires a machine to recognize all images of a particular person shot by different cameras. Specifically, it is a person comparison technology implemented based on the overall features of a person by finding one or more images of a person based on a given image of such person.

ReID has wide applications in criminal investigation in public security and image retrieval. In addition, ReID can help mobile phone users achieve image clustering and help retailers and supermarkets get customer trajectories and create commercial value. However, the precision of ReID is not high enough to be profitable currently, and much work still needs to be done manually.

Breaking the Industry Record of ReID and Surpassing Human Experts for the First Time

Research on ReID is very challenging due to the uncertainty in time and location when images are captured. In addition, different lighting, angles, and gestures, as well as occlusion strongly affects detection accuracy.

Thanks to the development of deep learning in recent years, ReID has become technologically mature. For the two most commonly used ReID test sets, Market1501 and CUHK03, the rank-1 identification accuracy have reached 89.9% and 91.8% respectively.

However, there still is a gap between the results and those achievable by human beings. Experiments show that the rank-1 accuracy of a skilled labeler on Market1501 and CUHK03 can reach 93.5% and 95.7%, respectively.

2

In order to test the ReID ability of human beings, the researchers assembled 10 professional labelers to carry out the test. Experiments show that the detection accuracy of a skilled labeler on Market1501 and CUHK03 can reach 93.5% and 95.7%, respectively. This is an exciting result that the current ReID method cannot achieve.

Not too long ago, Face++ (Megvii) made an exciting progress in this research: In an article titled "AlignedReID" published by the research team from the institute of Megvii, the authors proposed a new approach characterized by Dynamic Alignment, Mutual Learning and Re-Ranking, which made the rank-1 accuracy of the machines on Market1501 and CUHK03 reach 94.0% and 96.1% respectively. This is also the first time that machines have outperformed human experts in ReID, setting a record in the industry.

3

Machines have surpassed human beings in the more complex field of ReID in addition to facial recognition! This offers powerful technology that can be used to comprehend human images or videos.

Sun Jian, chief scientist and head of Megvii, said: "With the revival of deep learning methods in recent years, we have seen that machines have surpassed human beings in solving more and more image perception issues, from facial recognition in 2014 to ImageNet image classification in 2015. I remember that, not long ago, when I talked with my mentor, Dr. Shen Xiangyang (former Global Executive Vice President of Microsoft), I boasted that most perception issues would be resolved in 5-10 years. Today, I am very pleased to see another image perception issue, which is difficult and has great potential for application, has been solved by the algorithm developed by the Megvii team."

Multiple Networks Automatically Learning the Alignment of Human Features and Learning from Each Other

4

So how did the author achieved this?
Similar to other ReID methods based on deep learning, the author also used a deep convolutional neural network to extract features and used Triplet Loss after Hard Sample Mining as the loss function, and took the Euclidean distance of features as the similarity of two images.

5

The difference is that the author considered the alignment of the human body when studying the similarity of images. Although, some people had considered this before, such as dividing the human body into the head, the torso, and the legs, or performing an estimation based on the human skeleton, and performing an alignment based on the information about the skeleton. However, the latter approach introduced another difficult issue or required additional tagging. The idea of the author of AlignedReID [1] is to introduce an end-to-end approach that allows the network to automatically learn how to align the human body to improve performance.

In AlignedReID, deep convolutional neural networks extract both global features and local information. The distance between any pair of local information in two images is calculated to generate a distance matrix. Then the shortest path from the upper left corner to the lower right corner of the matrix is calculated through dynamic programming. An edge of the shortest path corresponds to the matching of a pair of local features, which gives a way of aligning the human body. The total distance of this alignment is the shortest when ensuring the relative order of the different parts of the body. During training, the length of the shortest path is added to the loss function to aid in the study of the overall features of a person.

6

As shown in the figure, some edges of this shortest path are redundant, such as the first edge in the figure. Why not just look for those matching edges? The author explained, "Local information should not only be matched, the alignment of the entire human body should also be taken into account. In order to match the human body from the head to the foot, it is necessary to have some redundant matchings. In addition, these redundant matchings contribute little to the length of the entire shortest path by designing the local distance function.

In addition to auto-aligning the body structure during training, the author also mentioned that the precision of the model can be effectively improved by training the two networks simultaneously and making them learn from each other. This training method is common in the classification issue, and the author made some improvements so that it can be applied to Metric Learning.

7

In the training process shown in the figure above, both networks trained at the same time include a branch for classification and a branch for Metric Learning. The two branches for classification learn from each other through KL divergence; the two branches for Metric Learning learn from each other through the metric mutual loss proposed by the author. As mentioned above, the branch for metric learning consists of two sub-branches, the sub-branch for global features and the sub-branch for local features. Interestingly, once training is completed, both the sub-branches for classification and local features will be discarded, and only the branch for global features will be retained for ReID. In other words, both person classification and the study of local features through the alignment of the human body aim to better get the global features of images.

Finally, the author also used the k-reciprocal encoding proposed in for reordering.

Conclusion

8

The first line in the figure above show the persons to be looked for. The bottom rows are results produced by a human tester and by the machine. Which row of images corresponds to result generated by the machine? (The answers will be revealed at the end of this article)

The approach presented in this article allows ReID technology to show better performance. However, at the end of this article, the author also pointed out that although machines outperform human beings in the two common datasets, it cannot be concluded that the task of ReID has been well resolved. In practical applications, human testers, especially those who are professionally trained, can be more accurate in evaluating images with crowds or in a dim environment. This is typically based on experience, intuition, the environment, and context. Therefore, people still have great advantages over machines in extreme conditions. In future practice, more efforts are needed to address and implement ReID.

Zhang Chi, one of the authors of AlignedReID, said: "When we started to study ReID in 2016, a rank-1 accuracy of 60% could be considered state of the art. However, businesses normally require an accuracy of at least 90% or higher for it to be practical. Even though we have outperformed human beings in the two common datasets, but this is just our first step towards real-world application. There will be many challenges to deal with in real-world scenarios. We hope that, with its development, ReID technology can make our society safer and smarter."

Finally, let's announce the answer to the previous question. The third row shows the results generated by the machine.

目录
相关文章
|
机器学习/深度学习 编解码 人工智能
Reading Notes: Human-Computer Interaction System: A Survey of Talking-Head Generation
由于人工智能的快速发展,虚拟人被广泛应用于各种行业,包括个人辅助、智能客户服务和在线教育。拟人化的数字人可以快速与人接触,并在人机交互中增强用户体验。因此,我们设计了人机交互系统框架,包括语音识别、文本到语音、对话系统和虚拟人生成。接下来,我们通过虚拟人深度生成框架对Talking-Head Generation视频生成模型进行了分类。同时,我们系统地回顾了过去五年来在有声头部视频生成方面的技术进步和趋势,强调了关键工作并总结了数据集。 对于有关于Talking-Head Generation的方法,这是一篇比较好的综述,我想着整理一下里面比较重要的部分,大概了解近几年对虚拟人工作的一些发展和
|
8月前
|
机器学习/深度学习 数据挖掘 Python
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
[Bart]论文实现:Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
59 0
|
机器学习/深度学习 移动开发 自然语言处理
DEPPN:Document-level Event Extraction via Parallel Prediction Networks 论文解读
当在整个文档中描述事件时,文档级事件抽取(DEE)是必不可少的。我们认为,句子级抽取器不适合DEE任务,其中事件论元总是分散在句子中
144 0
DEPPN:Document-level Event Extraction via Parallel Prediction Networks 论文解读
|
机器学习/深度学习 自然语言处理 算法
TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking 论文解读
近年来,从非结构化文本中提取实体和关系引起了越来越多的关注,但由于识别共享实体的重叠关系存在内在困难,因此仍然具有挑战性。先前的研究表明,联合学习可以显著提高性能。然而,它们通常涉及连续的相互关联的步骤,并存在暴露偏差的问题。
227 0
|
机器学习/深度学习 自然语言处理 测试技术
Query and Extract Refining Event Extraction as Type-oriented Binary Decoding 论文解读
事件抽取通常被建模为一个多分类问题,其中事件类型和论元角色被视为原子符号。这些方法通常仅限于一组预定义的类型。
75 0
|
机器学习/深度学习 存储 数据挖掘
Global Constraints with Prompting for Zero-Shot Event Argument Classification 论文解读
确定事件论元的角色是事件抽取的关键子任务。大多数以前的监督模型都利用了昂贵的标注,这对于开放域应用程序是不实际的。
79 0
|
机器学习/深度学习 存储 数据采集
DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled论文解读
我们提出了一个事件抽取框架,目的是从文档级财经新闻中抽取事件和事件提及。到目前为止,基于监督学习范式的方法在公共数据集中获得了最高的性能(如ACE 2005、KBP 2015)。这些方法严重依赖于人工标注的训练数据。
147 0
|
机器学习/深度学习 自然语言处理
【论文精读】COLING 2022 - DESED: Dialogue-based Explanation for Sentence-level Event Detection
最近许多句子级事件检测的工作都集中在丰富句子语义上,例如通过多任务或基于提示的学习。尽管效果非常好,但这些方法通常依赖于标签广泛的人工标注
108 0
|
机器学习/深度学习 自然语言处理 数据挖掘
【文本分类】Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
【文本分类】Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
174 0
【文本分类】Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
《Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification》电子版地址
Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
82 0
《Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker   Verification》电子版地址