Py-Feat: Python Facial Expression Analysis Toolbox
Py-Feat: Python Facial Expression Analysis Toolbox was written by Jin Hyun Cheong 1, Eshin Jolly , Tiankang Xie, Sophie Byrne, Matthew Kenney, & Luke J. Chang.
Facial expression analysis is notoriously difficult. Recent breakthroughs in affective computing have resulted in great improvements in autonomously recognizing facial expressions from images and videos. However, most of this research has yet to be extensively disseminated. Disseminated in social science fields such as psychology. Current cutting-edge models require significant topic expertise that is not normally found in social science
training programs available. Furthermore, there is a conspicuous dearth of user-friendly and open-source software program that provides a full collection of tools and functionalities to enable facial expression research. Py-Feat is an open-source Python toolkit that provides support for recognizing, preprocessing, analyzing, and displaying facial expression data.
Facial expressions can provide nonverbal channels for interpersonal and cross-species communication while also providing insights into an individual's interior mental state. One of the most difficult aspects of researching facial expressions has been reaching an agreement on how to effectively depict and objectively quantify expressions.
The Facial Affect Coding System (FACS) is one of the most widely used techniques for accurately measuring the intensity of groupings of facial muscles known as action units (AUs). However, obtaining facial expression information through FACS coding may be a time-consuming and arduous operation. To become a trained FACS coder, 100 hours of training are required, and manual labeling is time-consuming (e.g., one minute of video might take an hour 5 ) and prone to cultural biases and inaccuracies.
Facial electromyography (EMG) is one method for objectively recording from a limited number of facial muscles with high temporal resolution, but it requires specialized recording equipment, which limits data collection to the laboratory and can visually obscure the face, making it less suitable for social contexts.
Pipeline for analyzing facial expressions. Facial expression analysis begins with the capture of face photographs or videos with a recording device such as a webcam, camcorder, head mounted camera, or 360 camera. Researchers may use Py-Feat to identify facial attributes such as the location of the face inside a rectangular bounding box, the placement of significant facial landmarks, action units, and emotions after recording the face, and then validate the detection findings with picture overlays and bar graphs. Preprocessing the detection findings by extracting additional characteristics such as Histogram of Oriented Gradients (HOG) or multi-wavelet decomposition. The resulting data may then be evaluated using statistical methods such as t-tests, regressions, and intersubject correlations inside the toolbox.
Py-Feat: Design and module overview
There are now two primary modules in Py-Feat for working with facial expression data. To begin, the Detector module makes it simple for users to identify face expression elements in images or videos. We provide many models for extracting the most important facial expression characteristics that most end users would want to deal with. This comprises recognizing faces in the stimulus and determining the spatial coordinates of a bounding box for each face. In addition, we detect 68 facial landmarks, which are coordinates that indicate the spatial placement of the eyes, nose, mouth, and jaw. Models can utilize the bounding box and landmarks to identify head poses such as facial orientation in terms of rotation around axes in three-dimensional space.Py-Feat can also recognize higher-level facial expression characteristics like AUs and fundamental emotion categories. We provide numerous models for each detector to make the toolbox versatile for a wide range of applications, but we have also chosen appropriate defaults for users who may be overwhelmed by the quantity of possibilities. The characteristics cover the vast majority of the ways in which computer vision algorithms can now characterize face emotions. Importantly, when new characteristics and models become available in the field, they may be added to the toolkit.
In addition, Py-feat offers the Fex data module for working with the Detector module's features. This module contains techniques for preparing, analyzing, and visualizing data from face expressions.
Face detection
One of the most fundamental elements in the facial feature identification process is determining whether or not there is a face in the image and where that face is positioned. Faceboxes, Multi-task Convolutional Neural Network (MTCNN), and RetinaFace are three prominent face detectors included in Py-Feat. These detectors are frequently utilized in other open-source applications and are known to produce rapid and reliable results, especially for partially obstructed or non-frontal faces. Face detection results are displayed as a rectangular bounding box with a confidence score for each found face.
Landmark detection
After identifying a face in an image, it is customary to identify the facial landmarks, which are coordinate points in image space that outline a face's jaw, mouth, nose, eyes, and brows. The distances and angles between the landmarks can be utilized to depict facial emotions and infer emotional states such as pain. Py-feat employs a common 68-coordinate facial landmark scheme that is extensively used in datasets and software, and it presently supports three facial landmark detectors: the Practical Facial Landmark Detector (PFLD), MobileNets, and MobileFaceNets algorithms.
Head pose detection
Aside from its placement in an image or the location of certain elements of the face, the position of the head in three dimensional space is another aspect of a facial expression. From a head-on perspective, rotations may be characterized in terms of rotations around the x, y, and z planes, which are referred to as pitch, roll, and yaw, respectively. Py-feat adds Img2Pose model support. Because this model does not rely on earlier face detections, it may also be used to identify face bounding boxes. Img2Pose's confined version is fine-tuned on the 300W-LP dataset, which only includes head poses in the range (-90° to +90°).
Action unit detection
Py-feat offers models for identifying deviations of specific facial muscles (i.e., action units; AUs) from a neutral face expression using the FACS coding scheme, in addition to the fundamental features of a face in an image. There are presently two models in Py-feat for identifying action units. The models' architecture is based on the highly robust and well-performing model used in OpenFace, which extracts Histogram of Oriented Gradient (HOG) features from within the landmark coordinates using a convex hull algorithm, compresses the HOG representation using Principal Components Analysis (PCA), and then uses these features to predict each of the 12 AUs individually using popular shallow learning methods based on kernels (i.e., linear Support Vect).
Emotion detection
Finally, based on third-party judgements, Py-feat offers models for identifying the existence of certain emotion categories. Emotion detectors are trained using intentionally posed or naturally evoked emotional facial expressions, allowing them to categorize fresh photos depending on how closely a face matches a canonical emotional facial expression. It is worth noting that there is presently no agreement in the field as to whether categorical representations of emotion are the most trustworthy and valid nosology of emotional facial expressions. Detecting a smiling face as joyful, for example, does not always suggest that the individual is feeling an internal subjective state of pleasure, because these sorts of latent state inferences require extra contextual information beyond a static picture.
Robustness Experiments
Luminance
We modified our benchmark datasets to include two different levels of luminance (low, where the brightness factor was uniformly sampled from [0.1, 0.8] for each image, and high, where the brightness factor was uniformly sampled from [1.2, 1.9] for each image) to test the robustness of our model to different lighting conditions. This can be beneficial for determining how uneven lighting or minor differences in skin colour may affect the models. Overall, we discovered that the majority of deep learning detectors were quite resistant to changes in brightness.
However, high and low levels of variation have a greater influence on shallow learning detectors that rely on HOG characteristics.
Occlusion
Furthermore, we assessed the performance of each detector in three distinct occlusion circumstances. Face occlusions are quite prevalent in real-world data gathering circumstances, when a participant may hide their face with their hand or be partially obscured behind some other physical item. On the benchmark datasets given above, we masked out the eyes, nose, and mouth independently by applying a black mask to parts of the face using facial landmark information.
Experiments with Py-feat Detector Robustness. A) Illustration of a robustness modification. B) Face detection robustness findings for RetinaFace. The values represent Average Precision, with higher values indicating greater performance. C) The robustness of landmark detection findings. Normalized Mean Average Error (MAE) data are used, with lower values indicating greater performance. D) Pose detection robustness findings for img2pose-constrained images. The numbers are Mean Average Error (MAE), with lower values indicating greater performance. Results of Feat-XGB AU detection robustness. The figures represent F1 scores, with higher numbers indicating greater performance. We should mention that the DISFA+ dataset lacks labels for AU7. F) Emotion detection robustness findings from the Residual Masking Network. The figures represent F1 scores, with higher numbers indicating greater performance. G) Feat-XGB AU rotation resistance findings. The figures represent F1 scores, with higher numbers indicating greater performance.
Action unit to landmark visualization demonstration. (A): Facial expressions based on AU detections on real-world photos. Py-Feat's visualization model was used to project detected AU activations from each of six annotated pictures exhibiting one emotion. (B): Facial expressions produced by manually activating each AU in order.
No comments:
Post a Comment