What makes a facial expression realistic? How does this apply to an AI-generated avatar?
A dive into what a facial expression is and what this means for the project.
Before any analysis can be made that attempts to assess whether a face is "realistic", it is necessary to determine both what a facial expression is and its impact on social interactions. Once this is established, an attempt can be made to convert this into tangible metrics and compare this to OpenFace data.
What purpose do facial expressions serve?
Facial expressions serve as a way to signal emotions to others. Basic emotions (eg. happiness, sadness etc.) each govern their own type of facial expression, while other emotions are a combination of basic emotions (eg. anxiety is a combination of fear, sadness, anger, shame and interest).
The ability to recognize these emotions is innate rather than culturally learnt. This means that emotions are biologically hardwired, though cultures do differ in rules pertaining to the management or expression of their emotions. Spartans, for example, are famous for being stoic and showing as little emotion as possible.
What factors should evaluated when determining an expression for an AI-generated avatar?
The overall goal is to ensure people do not come across jarring discrepancies in a conversation with an AI-generated avatar. Discrepancies can include many things outside of the scope of my contribution, such as the displayed emotion not being in line with the context of the discussion. What I mainly think of, however, is the facial expression itself being extreme. Certain AUs
As was mentioned previously, facial expressions are used to signal emotion. Basic emotions have certain features tied to them. A system has been set up to categorise this called FACS (Facial Action Coding System):

This table aims to map Action Unit (AU) combinations to emotions. AU12 (lip-corner puller) and AU6 (cheek raiser) being active could, for example, indicate a happy expression.
Based on such a mapping, it is possible to, based on emotion, determine which AU combinations should be active. This can be directly used to (in the most basic manner) evaluate the realism of a provided image. As this simply looks at activated AUs and not necessarily their value (allowing extremes to slip through the cracks), this method can be enhanced by looking at extreme values and/or the ratios between different AU values.
Another method using this is to have a set of images, tagged with the displayed emotion and AU data, and train a machine learning algorithm to recognize when something is off.
Conclusion
Facial expressions are core to social interactions. They are used to express emotions, with each basic emotion having its own facial expression and other emotions being composites of their interacting basic emotions. These expressions are measured using FACS, which can in turn be used in a (basic) analysis. The system is not perfect, but is an important step in the right direction. Machine learning could be used later to enhance the results and ensure they are as accurate as possible.
Sources
Russell 1997 - "What does facial expression mean?"
"Facial expression and emotion" (Paul Ekman 1992)
Last updated
Was this helpful?