Facial Recognition and Swapping

简介: Photo sharing has become a norm in social networking sites today. Even with so much happening in social media, self-portraits, or selfies, continue to dominate.

Facial_Trends

Photo sharing has become a norm in social networking sites today. Even with so much happening in social media, self-portraits, or selfies, continue to dominate. The growing phenomena of selfies have resulted in the emergence of face-related applications embedded in cameras and social media platforms. These applications can track and detect human faces in real-time or even categorize photos by faces. They can also be used for verification, such as Alipay's face login, which uses a person's face as a personal ID.

These applications are based on either face detection or facial recognition technology. Facial recognition is an extension of face detection, which matches unique characteristics of a face for the purpose of identification. Collectively, both technologies are termed face recognition technology.

Outlined below are some of the different categories of facial effect applications.

Category of Effects

Face Warp

01

Face warp exposes images with an irregular lens to reshape or resize specific parts of a face. This is achieved through the remapping of the pixel coordinates.

Facial Textures and Accessories

02

Numerous applications, such as MeiTu and Snapchat, make use of these types of effects. Upon successful identification of a human face, the app allows users to apply different textures and accessories onto the photo. Most of these apps can also be applied to real-time videos.

Face Swap

03

Face swap is primarily applicable to group photos. First, the user identifies a source face and then swaps it with the target face in the same photo. The swapped faces are then processed with an image fusion technology to make the swap appear more realistic.

Face Morph

04

Similar to a face swap, face morph requires two faces but combines them into a single face. Face morph is also applicable to animated figures or animal faces.

Face Animation

05

This category is typically a combination of multiple face effects, such as a combination of face warp and textures. The images are then animated to enhance the effects.

Implementation Principles

06

In this example, the user's face replaces the one in the painting. The photo on the right shows the result of the replacement. In terms of algorithm, this process includes face detection, key point location, lens conversion, region extraction, color transfer, and edge fusion.

Face Detection

Face detection is a technology that identifies a human face in digital images.
This example uses DLib for face detection and the code is as follows:

dlib::frontal_face_detector detector = dlib::get_frontal_face_detector();
dlib::cv_image<dlib::rgb_pixel> img = cvImg;
std::vector<dlib::rectangle> faces = detector(img);

The rectangle boxes (dlib::rectangle) are the results of the detection.

07

Key Point Location

Upon detecting a human face, DLib performs key point location. Key points, also known as landmarks, help in identifying the key features of the face.
DLib provides a 68-point landmark detection function:

dlib::shape_predictor sp;

// Read the feature library
dlib::deserialize(LandMarksModelFile) >> sp;

// Get the first human face
dlib::full_object_detection shape = sp(img, faces[0]);
for (size_t i = 0; i < shape.num_parts(); i++) {
    dlib::point pt = shape.part(i);
    landmarks.push_back(pt);
}

The 68 landmarks are coordinates of various parts of the human face stored in the following order:

{
    IdxRange jaw;       // [0 , 16]
    IdxRange rightBrow; // [17, 21]
    IdxRange leftBrow;  // [22, 26]
    IdxRange nose;      // [27, 35]
    IdxRange rightEye;  // [36, 41]
    IdxRange leftEye;   // [42, 47]
    IdxRange mouth;     // [48, 59]
    IdxRange mouth2;    // [60, 67]
}

08

Lens Deformation

The lens deformation effect in this example is performed using homography transformation. Homography "H" describes the correspondence between two human faces, and treats a human face as a plane for location transformation:

09

// Estimate the homography transformation between two human faces based on the landmark
cv::Mat H = cv::findHomography(face1.landMarks, face2.landMarks);

// Apply homography transformation to the entire photo
cv::warpPerspective(im1, warpIm1, H, im2.size());

The transformation result is shown in the figure below. We can see that the angle and posture of the transformed face is similar to the face in the painting.

10

Regional Extraction

The regional extraction technique filters out all the other aspects/parts of a face, including hair and neck. The aim of regional extraction is to find a mask containing only the landmarks of the face. To obtain the mask, Gaussian Blur is first applied to blur the image on the region, expanding the selected region. Binarization is then performed to convert an ordinary image into a binary image:

int blurAmount = 5;
cv::Mat maskBlur;
cv::GaussianBlur(histMask, maskBlur, cv::Size(blurAmount, blurAmount), 0);
cv::threshold(maskBlur, histMask, 0, 255, CV_THRESH_BINARY); 

11

Color Transfer

The aim of color transfer is to make the color of the current face similar to the face intended for replacement. While various ways exist to achieve such transfer, this example adopts the histogram adjustment method, which is comparatively easy to implement. It involves the following steps:
1) Calculate the color histograms of the current image and the target image
2) Adjust the histogram of the current image to make it consistent with that of the target image
3) Apply the adjusted histogram to the current image

12

Edge Fusion

After the color transfer, the extracted face is ready to be transferred. However, if we copy the face directly onto the other, the edge may look abrupt. As such, this demo applies the Laplacian pyramid fusion to make the edges more coherent. Click to learn more about Laplacian pyramid based image fusion.

13

Conclusion

Emerging facial trends, namely facial recognition and swapping, continue to draw attention on social media due to the ease with which individuals can manipulate photos. However, these technologies are not only applicable to fun and social apps but also useful for more critical applications.

With the advancements of deep learning, the accuracy of face recognition has greatly improved. Many startups are taking advantage of these technological improvements, producing a multitude of products with various applications. One such start-up, Megvii's Face++, provides high-recognition accuracy solutions, ranked among the top globally. The company aims to expand into industries such as finance, smart cities, and robotics in the near future.

Some links

Face2Face: Real-time Face Capture and Reenactment of RGB Videos
A highly controversy new technology at CVPR - Face2Face
Switching Eds: Face swapping with Python, dlib, and OpenCV
https://github.com/mc-jesus/FaceSwap

目录
相关文章
|
机器学习/深度学习 搜索推荐 算法
Learning Disentangled Representations for Recommendation | NIPS 2019 论文解读
近年来随着深度学习的发展,推荐系统大量使用用户行为数据来构建用户/商品表征,并以此来构建召回、排序、重排等推荐系统中的标准模块。普通算法得到的用户商品表征本身,并不具备可解释性,而往往只能提供用户-商品之间的attention分作为商品粒度的用户兴趣。我们在这篇文章中,想仅通过用户行为,学习到本身就具备一定可解释性的解离化的用户商品表征,并试图利用这样的商品表征完成单语义可控的推荐任务。
23692 0
Learning Disentangled Representations for Recommendation | NIPS 2019 论文解读
|
5月前
|
算法 计算机视觉
2017cvpr论文解读——Nasal Patches and Curves for Expression-Robust 3D Face Recognition
2017cvpr论文解读——Nasal Patches and Curves for Expression-Robust 3D Face Recognition
20 1
|
9月前
|
机器学习/深度学习 编解码 自然语言处理
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers论文解读
我们提出了SegFormer,一个简单,高效而强大的语义分割框架,它将transformer与轻量级多层感知器(MLP)解码器统一起来。
477 0
|
9月前
|
机器学习/深度学习 编解码 数据可视化
Speech Emotion Recognition With Local-Global aware Deep Representation Learning论文解读
语音情感识别(SER)通过从语音信号中推断人的情绪和情感状态,在改善人与机器之间的交互方面发挥着至关重要的作用。尽管最近的工作主要集中于从手工制作的特征中挖掘时空信息,但我们探索如何从动态时间尺度中建模语音情绪的时间模式。
85 0
|
9月前
|
机器学习/深度学习 PyTorch 测试技术
SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation 论文解读
我们提出了SegNeXt,一种用于语义分割的简单卷积网络架构。最近的基于transformer的模型由于在编码空间信息时self-attention的效率而主导了语义分割领域。在本文中,我们证明卷积注意力是比transformer中的self-attention更有效的编码上下文信息的方法。
220 0
|
机器学习/深度学习 移动开发 数据挖掘
Understanding Few-Shot Learning in Computer Vision: What You Need to Know
Few-Shot Learning is a sub-area of machine learning. It’s about classifying new data when you have only a few training samples with supervised information. FSL is a rather young area that needs more research and refinement. As of today, you can use it in CV tasks. A computer vision model can work
132 0
|
机器学习/深度学习 自然语言处理 算法
【文本分类】Convolutional Neural Networks for Sentence Classification
【文本分类】Convolutional Neural Networks for Sentence Classification
【文本分类】Convolutional Neural Networks for Sentence Classification
|
机器学习/深度学习 资源调度 算法框架/工具
翻译:Deep Residual Learning for Image Recognition
翻译:Deep Residual Learning for Image Recognition
102 0
|
机器学习/深度学习 数据挖掘 语音技术
Interspeech 2017 Series | Acoustic Model for Speech Recognition Technology
Interspeech 2017 witnessed participation from esteemed universities, research institutes, and companies including Alibaba Group, who shared their newest technologies and products.
2700 0
|
机器学习/深度学习
Interspeech 2017 - Speech Synthesis Technology
Participants from renowned research institutes, universities, and companies have shared their newest technologies and products during Interspeech 2017.
3021 0