Artificial intelligence is incomplete without the inclusion of emotional intelligence. Computers need empathy to be able to properly recognize and respond in a humanized natural way when human emotions are shown. It has been shown that robots and computers can be trained to have emotional intelligence. A computer’s ability to act intelligently is on par with a human’s ability to act mechanically. So, this makes the human brain the perfect blueprint for artificial intelligence.
Emotional intelligence is a much-needed aspect for computers to acquire. However, this acquisition raises a question: which human values should artificial intelligence be taught? What is viewed as appropriate or ethical in one country, may not be viewed the same in another and individuals’ facial and psychological cues differ from one another.
Even though having an emotionally intelligent computer can lead to a more personalised feel, it can only do so if emotion analytics were individualised. This is because people are complex in nature with distinct characteristics that portray varying expressions based on geographic location, meaning that the computer would have to first learn the individual’s ‘neutral’ face, which is the baseline mode, and compare any deviation from the baseline mode to be able to understand the emotion expressed.
Nevertheless, a large amount of valuable data is lost to machines as they do not have the ability to read them. This data comes in the form of expressions, gestures, speech patterns, tone of voice, and body language among others. The traditional ‘bag of words’ method is not enough to analyse all these aspects of human communication. It works on a word-by-word basis without taking context into consideration. If the word belongs to its positive ‘list’ then, regardless of the context, the computer would translate the entire sentence as having a positive connotation. Dismissing the context can sometimes alter the true meaning and, therefore, mood of the message, as some languages, such as the English language, have words with multiple meanings.
Subsequently, modern methods employ recurrent neural networks called LSTMs (Long Short-Term Memory), which compresses the entire sentence into a vector that holds the meaning of the sentence, while taking word order into account, leading to higher accuracy. Instead of the usual natural language processing, the Speech API private beta uses voice to identify laughter, anger, voice, volume, tone, speed, and pauses. This analysis takes context into consideration, and so delivers a more encompassing understanding of human emotion, which can later be utilised to provide a more personal, individualised experience. These experiences provide more humanlike interactions, which enhance personal lives.
Additionally, detecting emotion is not only limited to voice and keywords, facial cues also play a large role in emotional expression. However, voice and video emotion detection differ from one another. Videos give the ability to detect whether an emotion is negative or positive, while voice gives the ability to detect the intensity of emotions, otherwise known as the arousal level. Combining the intensity of the emotion with the emotion itself provides a more thorough experience.
Human interactions with technology are increasing at an accelerated rate. Due to this rapid increase, expectations for AI to understand human emotions are rising. People expect Siri or Alexa to know exactly what they like or to respond in a manner that is empathetic. Interactions with future artificially intelligent computers should feel the same as interacting with humans.